OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

PLOS ONE 2014

QuickProbs—A Fast Multiple Sequence Alignment Algorithm Designed for Graphics Processors

DOI: 10.1371/journal.pone.0088901

Adam Gudy？, Sebastian Deorowicz

Full-Text Cite this paper Add to My Lib

Abstract:

Multiple sequence alignment is a crucial task in a number of biological analyses like secondary structure prediction, domain searching, phylogeny, etc. MSAProbs is currently the most accurate alignment algorithm, but its effectiveness is obtained at the expense of computational time. In the paper we present QuickProbs, the variant of MSAProbs customised for graphics processors. We selected the two most time consuming stages of MSAProbs to be redesigned for GPU execution: the posterior matrices calculation and the consistency transformation. Experiments on three popular benchmarks (BAliBASE, PREFAB, OXBench-X) on quad-core PC equipped with high-end graphics card show QuickProbs to be 5.7 to 9.7 times faster than original CPU-parallel MSAProbs. Additional tests performed on several protein families from Pfam database give overall speed-up of 6.7. Compared to other algorithms like MAFFT, MUSCLE, or ClustalW, QuickProbs proved to be much more accurate at similar speed. Additionally we introduce a tuned variant of QuickProbs which is significantly more accurate on sets of distantly related sequences than MSAProbs without exceeding its computation time. The GPU part of QuickProbs was implemented in OpenCL, thus the package is suitable for graphics processors produced by all major vendors.

References

[1]	Wang L, Jiang T (1994) On the complexity of multiple sequence alignment. Journal of Computational Biology 1: 337–348. doi: 10.1089/cmb.1994.1.337
[2]	Just W (1999) Computational complexity of multiple sequence alignment with SP-Score. Journal of Computational Biology 8: 615–623. doi: 10.1089/106652701753307511
[3]	Feng DF, Doolittle RF (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. Journal of Molecular Evolution 25: 351–360. doi: 10.1007/bf02603120
[4]	Barton GJ, Sternberg MJ (1987) A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. Journal of Molecular Biology 198: 327–337. doi: 10.1016/0022-2836(87)90316-0
[5]	Krogh A, Brown M, Mian IS, Sj？lander K, Haussler D (1994) Hidden Markov models in computational biology: applications to protein modeling. Journal of Molecular Biology 235: 1501–1531. doi: 10.1006/jmbi.1994.1104
[6]	Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22: 4673–4680. doi: 10.1093/nar/22.22.4673
[7]	Notredame C, Higgins D, Heringa J (2000) T-Coffee: A novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology 302: 205–217. doi: 10.1006/jmbi.2000.4042
[8]	Katoh K, Misawa K, Kuma K, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research 30: 3059–3066. doi: 10.1093/nar/gkf436
[9]	Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32: 1792–1797. doi: 10.1093/nar/gkh340
[10]	Do C, Mahabhashyam M, Brudno M, Batzoglou S (2005) ProbCons: Probabilistic consistencybased multiple sequence alignment. Genome Research 15: 330–340. doi: 10.1101/gr.2821705
[11]	Roshan U, Livesay DR (2006) Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformatics 22: 2715–2721. doi: 10.1093/bioinformatics/btl472
[12]	Liu Y, Schmidt B, Maskell D (2010) MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities. Bioinformatics 26: 1958–1964. doi: 10.1093/bioinformatics/btq338
[13]	O′Sullivan O, Suhre K, Abergel C, Higgins D, Notredame C (2004) 3DCoffee: Combining protein sequences and structures within multiple sequence alignments. Journal of Molecular Biology 340: 385–395. doi: 10.1016/j.jmb.2004.04.058
[14]	Deng X, Cheng J (2011) MSACompro: protein multiple sequence alignment using predicted secondary structure, solvent accessibility, and residue-residue contacts. BMC Bioinformatics 12: 472. doi: 10.1186/1471-2105-12-472
[15]	Katoh K, Kuma Ki, Toh H, Miyata T (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Research 33: 511–518. doi: 10.1093/nar/gki198
[16]	Huerta-Cepas J, Capella-Gutierrez S, Pryszcz LP, Denisov I, Kormes D, et al. (2011) PhylomeDB v3.0: an expanding repository of genome-wide collections of trees, alignments and phylogeny-based orthology and paralogy predictions. Nucleic Acids Research 39: 556–560. doi: 10.1093/nar/gkq1109
[17]	Capella-Gutierrez S (2012) Analysis of multiple protein sequence alignments and phylogenetic trees in the context of phylogenomics studies. Pompeu Fabra UniversityPh.D. thesis
[18]	Lassmann T, Sonnhammer E (2005) Kalign\|an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics 6: 298. doi: 10.1186/1471-2105-6-298
[19]	Lassmann T, Frings O, Sonnhammer E (2009) Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features. Nucleic Acids Research 37: 858–865. doi: 10.1093/nar/gkn1006
[20]	Wu S, Manber U (1992) Fast text searching: allowing errors. Communications of the ACM 35: 83–91. doi: 10.1145/135239.135244
[21]	Muth R, Manber U (1996) Approximate multiple string search. In: Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching. pp. 75-86.
[22]	Deorowicz S, Debudaj-Grabysz A, Gudy？ A (2014) Kalign-LCS\|more accurate and faster variant of Kalign2 algorithm for the multiple sequence alignment problem. In: Man-Machine Interactions 3, Springer Cham Heidelberg New York Dordrecht London. pp. 495-502.
[23]	Katoh K, Toh H (2007) Parttree: an algorithm to build an approximate tree from a large number of unaligned sequences. Bioinformatics 23: 372–374. doi: 10.1093/bioinformatics/btl592
[24]	Sievers F, Wilm A, Dineen D, Gibson T, Karplus K, et al. (2011) Fast, scalable generation of highquality protein multiple sequence alignments using Clustal Omega. Molecular Systems Biology 7: 539. doi: 10.1038/msb.2011.75
[25]	Blackshields G, Sievers F, Shi W, Wilm A, Higgins D (2010) Sequence embedding for fast construction of guide trees for multiple sequence alignment. Algorithms for Molecular Biology 5: 21. doi: 10.1186/1748-7188-5-21
[26]	Liu W, Schmidt B, Voss G, Muller-Wittig W (2006) GPU-ClustalW: Using graphics hardware to accelerate multiple sequence alignment. Lecture Notes in Computer Science 4297: 363–374. doi: 10.1007/11945918_37
[27]	Liu Y, Schmidt B, Maskell D (2009) MSA-CUDA: Multiple sequence alignment on graphics processing units with CUDA. In: Proceedings of the 20th IEEE International Conference on Applicationspecific Systems, Architectures and Processors. pp. 121-128.
[28]	Gudy？ A, Deorowicz S (2012) A parallel algorithm for the constrained multiple sequence alignment problem designed for GPUs. International Journal of Foundations of Computer Science 23: 877–901. doi: 10.1142/s0129054112500098
[29]	Lin YS, Lin CY, Li ST, Lee JY, Tang CY (2010) GPU-REMuSiC: the implementation of constrain multiple sequence alignment on graphics processing units. In: Proceedings of the 2010 GPU Technology Conference. NVidia.
[30]	Blazewicz J, Frohmberg W, Kierzynka M, Wojciechowski P (2013) G-MSA\|A GPU-based, fast and accurate algorithm for multiple sequence alignment. Journal of Parallel and Distributed Computing 73: 32–41. doi: 10.1016/j.jpdc.2012.04.004
[31]	OpenMP ARB (2013) OpenMP Application Program Interface version 4.0. Available: http://www.openmp.org/mp-documents/OpenM？P4.0.0.pdf.
[32]	Manavski S, Valle G (2008) CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment. BMC Bioinformatics 9: S10. doi: 10.1186/1471-2105-9-s2-s10
[33]	Ligowski L, Rudnicki W (2009) An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases. In: Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing. Washington,USA: IEEE Computer Society, pp. 1-8.
[34]	Liu Y, Schmidt B, Maskell D (2010) CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions. BMC Research Notes 3: 93. doi: 10.1186/1756-0500-3-93
[35]	Khajeh-Saeed A, Poole S, Perot J (2010) Acceleration of the Smith-Waterman algorithm using single and multiple graphics processors. Journal of Computational Physics 229: 4247–4258. doi: 10.1016/j.jcp.2010.02.009
[36]	Blazewicz J, Frohmberg W, Kierzynka M, Pesch E, Wojciechowski P (2011) Protein alignment algorithms with an efficient backtracking routine on multiple GPUs. BMC Bioinformatics 12: 181. doi: 10.1186/1471-2105-12-181
[37]	Liu Y, Wirawan A, Schmidt B (2013) CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions. BMC Bioinformatics 14: 117. doi: 10.1186/1471-2105-14-117
[38]	Liu CM, Wong T, Wu E, Luo R, Yiu SM, et al. (2012) SOAP3: ultra-fast GPU-based parallel alignment tool for short reads. Bioinformatics 28: 878–879. doi: 10.1093/bioinformatics/bts061
[39]	Chang DJ, Kimmer C, Ouyang M (2010) Accelerating the Nussinov RNA folding algorithm with CUDA/GPU. In: Proceedings of the 10th IEEE International Symposium on Signal Processing and Information. IEEE Computer Society, pp. 120-125: 20. doi: 10.1109/isspit.2010.5711746
[40]	Suchard MA, Rambaut A (2009) Many-core algorithms for statistical phylogenetics. Bioinformatics 25: 1370–1376. doi: 10.1093/bioinformatics/btp244
[41]	Demouth J (2012) Sparse Matrix-Matrix Multiplication on the GPU. In: Proceedings of the GPU Technology Conference 2012. NVidia.
[42]	NVidia (2013) CUSP library version 0.4.0. Available: https://developer.nvidia.com/cusp.
[43]	NVidia (2013) cuSPARSE library version 5.5. Available: https://developer.nvidia.com/cusparse.
[44]	Durbin R, Eddy SR, Krogh A, Mitchison G (1998) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press.
[45]	Thompson JD, Plewniak F, Poch O (1999) A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Research 27: 2682–2690. doi: 10.1093/nar/27.13.2682
[46]	Stoye J, Evers D, Meyer F (1998) Rose: generating sequence families. Bioinformatics 14: 157–163. doi: 10.1093/bioinformatics/14.2.157
[47]	NVidia (2013) CUDA Parallel Computing Platform version 5.5. Available: http://docs.nvidia.com/cuda/pdf/CUDA_C_P？rogramming_Guide.pdf.
[48]	Khronos Group (2013) The OpenCL Specification version 2.0. Available: http://www.khronos.org/registry/cl/specs？/opencl-2.0.pdf.
[49]	Viterbi A (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory 13: 260–269. doi: 10.1109/tit.1967.1054010
[50]	Sneath P, Sokal R (1973) Numerical Taxonomy. The Principles and Practice of Numerical Classification. San Francisco, USA: W.H. Freeman Limited.
[51]	Needleman S, Wunsch C (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48: 443 - 453.
[52]	Thompson J, Koehl P, Ripp R, Poch O (2005) BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins 61: 127–136. doi: 10.1002/prot.20527
[53]	Raghava GPS, Searle S, Audley P, Barber J, Barton G (2003) OXBench: A benchmark for evaluation of protein multiple sequence alignment accuracy. BMC Bioinformatics 4: 47. doi: 10.1186/1471-2105-4-47
[54]	Edgar RC (2009) Benchmark collection. Available: http://www.drive5.com/bench.
[55]	Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, et al. (2008) The Pfam protein families database. Nucleic Acids Research 36: D281–D288. doi: 10.1093/nar/gkm960
[56]	Edgar RC (2009) QSCORE multiple alignment scoring software. Available: http://www.drive5.com/qscore.
[57]	Wilcoxon F (1945) Individual Comparisons by Ranking Methods. Biometrics Bulletin 1: 80–83. doi: 10.2307/3001968

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133