OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

PLOS ONE 2014

MC64-ClustalWP2: A Highly-Parallel Hybrid Strategy to Align Multiple Sequences in Many-Core Architectures

DOI: 10.1371/journal.pone.0094044

David Díaz, Francisco J. Esteban, Pilar Hernández, Juan Antonio Caballero, Antonio Guevara, Gabriel Dorado, Sergio Gálvez

Full-Text Cite this paper Add to My Lib

Abstract:

We have developed the MC64-ClustalWP2 as a new implementation of the Clustal W algorithm, integrating a novel parallelization strategy and significantly increasing the performance when aligning long sequences in architectures with many cores. It must be stressed that in such a process, the detailed analysis of both the software and hardware features and peculiarities is of paramount importance to reveal key points to exploit and optimize the full potential of parallelism in many-core CPU systems. The new parallelization approach has focused into the most time-consuming stages of this algorithm. In particular, the so-called progressive alignment has drastically improved the performance, due to a fine-grained approach where the forward and backward loops were unrolled and parallelized. Another key approach has been the implementation of the new algorithm in a hybrid-computing system, integrating both an Intel Xeon multi-core CPU and a Tilera Tile64 many-core card. A comparison with other Clustal W implementations reveals the high-performance of the new algorithm and strategy in many-core CPU architectures, in a scenario where the sequences to align are relatively long (more than 10 kb) and, hence, a many-core GPU hardware cannot be used. Thus, the MC64-ClustalWP2 runs multiple alignments more than 18x than the original Clustal W algorithm, and more than 7x than the best x86 parallel implementation to date, being publicly available through a web service. Besides, these developments have been deployed in cost-effective personal computers and should be useful for life-science researchers, including the identification of identities and differences for mutation/polymorphism analyses, biodiversity and evolutionary studies and for the development of molecular markers for paternity testing, germplasm management and protection, to assist breeding, illegal traffic control, fraud prevention and for the protection of the intellectual property (identification/traceability), including the protected designation of origin, among other applications.

References

[1]	Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48: 443–453. doi: 10.1016/0022-2836(70)90057-4
[2]	Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147: 195–197. doi: 10.1016/0022-2836(81)90087-5
[3]	Gotoh O (1982) An improved algorithm for matching biological sequences. Journal of Molecular Biology 162: 705–708. doi: 10.1016/0022-2836(82)90398-9
[4]	Hirschberg DS (1975) A linear space algorithm for computing maximal common subsequences. Commun ACM 18: 341–343. doi: 10.1145/360825.360861
[5]	Driga A, Lu P, Schaeffer J, Szafron D, Charter K, et al. (2006) FastLSA: A Fast, Linear-Space, Parallel and Sequential Algorithm for Sequence Alignment. Algorithmica 45: 337–375. doi: 10.1007/s00453-006-1217-y
[6]	Pearson WR, Lipman DJ (1988) Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences of the United States of America 85: 2444–2448. doi: 10.1073/pnas.85.8.2444
[7]	Altschul S, Gish W, Miller W, Myers E-M, Lipman D (1990) Basic local alignment search tool. J Mol Biol 215: 403–410. doi: 10.1016/s0022-2836(05)80360-2
[8]	Pearson WR (1991) Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics 11: 635–650. doi: 10.1016/0888-7543(91)90071-l
[9]	Mirto M, Fiore S, Epicoco I, Cafaro M, Mocavero S, et al. (2008) A Bioinformatics Grid Alignment Toolkit. Future Generation Computer Systems 24: 752–762. doi: 10.1016/j.future.2008.02.001
[10]	Goetzmann JF (2007) Massively Parallel Contact Simulation on Graphics Hardware using NVIDIA CUDA [Bacherlor's Thesis]. Institute for Computer Science, Universit？t Mainz.
[11]	Adapteva (2011) Epiphany Multicore IP. Available: http://www.adapteva.com/index.php?option？=com_content&view=article&id=72&Itemid=7？9. Accessed 2013 Jul 16.
[12]	Shah M, Barreh J, Brooks J, Golla R, Grohoski G, et al. (2007) UltraSPARC T2: A highly-treaded, power-efficient, SPARC SOC Asian Solid-State Circuits Conference. (ASSCC07): 22–25. doi: 10.1109/asscc.2007.4425786
[13]	Mattson TG, Wijngaart RVD, Frumkin M (2008) Programming the Intel 80-core network-on-a-chip Terascale processor. Proceedings of the 2008 ACM/IEEE conference on Supercomputing. Austin, Texas: IEEE Press. pp. 1–11.
[14]	Intel (2010) The SCC Platform Overview. Available: http://techresearch.intel.com/spaw2/uplo？ads/files/SCC-Overview.pdf. Accessed 2013 Jul 16.
[15]	Intel (2010) Intel's Teraflops Research Chip. Available: http://download.intel.com/pressroom/kits？/Teraflops/Teraflops_Research_Chip_Overv？iew.pdf. Accessed 2013 Jul 16.
[16]	Wentzlaff D, Griffin P, Hoffmann H, Bao L, Edwards B, et al. (2007) On-Chip Interconnection Architecture of the Tile Processor. IEEE Micro 27: 15–31. doi: 10.1109/mm.2007.4378780
[17]	Tilera (2011) Product Brief: TILE-Gx 8000 Series. Available: http://www.tilera.com/sites/default/file？s/productbriefs/TILE-Gx8000Series Brief_0.pdf. Accessed 2013 Jul 16.
[18]	Manavski SA, Valle G (2008) CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment. BMC Bioinformatics 9 Suppl 2: S10. doi: 10.1186/1471-2105-9-s2-s10
[19]	Liu Y, Schmidt B, Maskell DL (2010) CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions. BMC Research Notes 3: 93. doi: 10.1186/1756-0500-3-93
[20]	Díaz D, Esteban FJ, Hernández P, Caballero JA, Dorado G, et al. (2011) Parallelizing and optimizing a bioinformatics pairwise sequence alignment algorithm for many-core architecture. Parallel Computing 37: 244–259. doi: 10.1016/j.parco.2011.03.003
[21]	Esteban FJ, Díaz D, Hernández P, Caballero JA, Dorado G, et al. (2013) Direct approaches to exploit many-core architecture in bioinformatics. Future Generation Computer Systems 29: 15–26. doi: 10.1016/j.future.2012.03.018
[22]	Gálvez S, Díaz D, Hernández P, Esteban FJ, Caballero JA, et al. (2010) Next-Generation Bioinformatics: Using Many-Core Processor Architecture to Develop a Web Service for Sequence Alignment. Bioinformatics 26: 683–686. doi: 10.1093/bioinformatics/btq017
[23]	Agrifood Biotechnology (“Biotecnología Agroalimentaria”) Research Group (2009) Many-core bioinformatics algorithms development. Available: http://galactus.uma.es/manycore/. Accessed 2013 Sep 11.
[24]	Bains W (1986) MULTAN: a program to align multiple DNA sequences. Nucleic Acids Research 14: 159–177. doi: 10.1093/nar/14.1.159
[25]	Waterman MS (1986) Multiple sequence alignment by consensus. Nucleic Acids Research 14: 9095–9102. doi: 10.1093/nar/14.22.9095
[26]	Higgins DG, Sharp PM (1988) CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene 73: 237–244. doi: 10.1016/0378-1119(88)90330-7
[27]	Higgins DG, Bleasby AJ, Fuchs R (1992) CLUSTAL V: improved software for multiple sequence alignment. Computer applications in the biosciences: CABIOS 8: 189–191. doi: 10.1093/bioinformatics/8.2.189
[28]	Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22: 4673–4680. doi: 10.1093/nar/22.22.4673
[29]	Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, et al. (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23: 2947–2948. doi: 10.1093/bioinformatics/btm404
[30]	Notredame C, Higgins DG, Heringa J (2000) T-coffee: a novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology 302: 205–217. doi: 10.1006/jmbi.2000.4042
[31]	Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32: 1792–1797. doi: 10.1093/nar/gkh340
[32]	Brudno M, Do CB, Cooper GM, Kim MF, Davydov E, et al. (2003) LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res 13: 721–731. doi: 10.1101/gr.926603
[33]	Hohl M, Kurtz S, Ohlebusch E (2002) Efficient multiple genome alignment. Bioinformatics 18: S312–320. doi: 10.1093/bioinformatics/18.suppl_1.s312
[34]	Dewey CN (2007) Aligning multiple whole genomes with Mercator and MAVID. Methods Mol Biol 395: 221–236. doi: 10.1385/1-59745-514-8:221
[35]	Chain P, Kurtz S, Ohlebusch E, Slezak T (2003) An applications-focused review of comparative genomics tools: Capabilities, limitations and future challenges. Briefings in Bioinformatics 4: 105–123. doi: 10.1093/bib/4.2.105
[36]	Darling AE, Mau B, Perna NT (2010) progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement. PLoS ONE 5: e11147. doi: 10.1371/journal.pone.0011147
[37]	Paten B, Earl D, Nguyen N, Diekhans M, Zerbino D, et al. (2011) Cactus: Algorithms for genome multiple sequence alignment. Genome Res 21: 1512–1528. doi: 10.1101/gr.123356.111
[38]	Notredame C (2007) Recent Evolutions of Multiple Sequence Alignment Algorithms. PLoS Comput Biol 3: e123. doi: 10.1371/journal.pcbi.0030123
[39]	Vandierendonck H, Rul S, Questier M, De Bosschere K (2008) Experiences with Parallelizing a Bio-informatics Program on the Cell BE High Performance Embedded Architectures and Compilers. In:Stenstr？m P, Dubois M, Katevenis M, Gupta R, Ungerer T, editors: Springer Berlin/Heidelberg. pp. 161–175.
[40]	Mikhailov D, Cofer H, Gomperts R (2001) Performance optimization of Clustal W: parallel Clustal W, HT Clustal, and MULTICLUSTAL. Silicon Graphics, Inc.
[41]	Chaichoompu K, Kittitornkun S, Tongsima S (2006) MT-ClustalW: multithreading multiple sequence alignment. Proceedings of the 20th Parallel and Distributed Processing Symposium (IPDPS 2006). pp. 8.
[42]	Vandierendonck H, Rul S, De Bosschere K (2010) Accelerating Multiple Sequence Alignment with the Cell BE Processor. The Computer Journal 53: 814–826. doi: 10.1093/comjnl/bxp086
[43]	Li K-B (2003) ClustalW-MPI: ClustalW analysis using distributed and parallel computing. Bioinformatics 19: 1585–1586. doi: 10.1093/bioinformatics/btg192
[44]	Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4: 406–425.
[45]	Myers EW, Miller W (1988) Optimal alignments in linear space. Comput Appl Biosci 4: 11–17. doi: 10.1093/bioinformatics/4.1.11
[46]	Cheetham J, Dehne F, Pitre S, Rau-Chaplin A, Taillon PJ (2003) Parallel CLUSTAL W for PC clusters. Proceedings of the 2003 international conference on Computational science and its applications: Part II. Montreal, Canada: Springer-Verlag. pp. 300–309.
[47]	Yongchao L, Schmidt B, Maskell DL (2009) MSA-CUDA: Multiple Sequence Alignment on Graphics Processing Units with CUDA. Proceedings of the 20th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2009). pp. 121–128.
[48]	Yongchao L, Schmidt B, Maskell DL (2009) Parallel reconstruction of neighbor-joining trees for large multiple sequence alignments using CUDA. Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2009). pp. 1–8.
[49]	Michener CD, Sokal RR (1957) A quantitative approach to a problem of classification. Evolution 11. doi: 10.2307/2406046
[50]	Isaza S, Sanchez F, Gaydadjiev G, Ramirez A, Valero M (2010) Scalability Analysis of Progressive Alignment on a Multicore. Proceedings of the International Conference on Complex, Intelligent and Software Intensive Systems (CISIS 2010). pp. 889–894.
[51]	Stone JE, Gohara D, Shi G (2010) OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems. Computing in Science and Engineering. pp. 66–73.
[52]	Katoh K, Toh H (2007) PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences. Bioinformatics 23: 372–374. doi: 10.1093/bioinformatics/btl592
[53]	Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, et al. (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7: 539. doi: 10.1038/msb.2011.75
[54]	Higgins DG, Blackshields G, Wallace IM (2005) Mind the gaps: progress in progressive alignment. Proc Natl Acad Sci U S A 102: 10411–10412. doi: 10.1073/pnas.0504801102
[55]	Kolb AW, Adams M, Cabot EL, Craven M, Brandt CR (2011) Multiplex sequencing of seven ocular herpes simplex virus type-1 genomes: phylogeny, sequence variability, and SNP distribution. Invest Ophthalmol Vis Sci 52: 9061–9073. doi: 10.1167/iovs.11-7812
[56]	Norberg P, Bergstrom T, Rekabdar E, Lindh M, Liljeqvist JA (2004) Phylogenetic analysis of clinical herpes simplex virus type 1 isolates identified three genetic groups and recombinant viruses. J Virol 78: 10755–10764. doi: 10.1128/jvi.78.19.10755-10764.2004
[57]	Perez-Jimenez M, Besnard G, Dorado G, Hernandez P (2013) Varietal tracing of virgin olive oils based on plastid DNA variation profiling. PLoS One 8: e70507. doi: 10.1371/journal.pone.0070507
[58]	Esteban F, Díaz D, Hernández P, Caballero J, Dorado G, et al. (2013) MC64-Cluster: A Many-Core CPU Cluster for Bioinformatics Applications. In: Rocha á, Correia AM, Wilson T, Stroetmann KA, editors. Advances in Information Systems and Technologies: Springer Berlin Heidelberg. pp. 819–825.
[59]	Du Z, Lin F (2006) pNJTree: A parallel program for reconstruction of neighbor-joining tree and its application in ClustalW. Parallel Computing 32: 441–446. doi: 10.1016/j.parco.2006.05.001

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133