全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

HMEC: A Heuristic Algorithm for Individual Haplotyping with Minimum Error Correction

DOI: 10.1155/2013/291741

Full-Text   Cite this paper   Add to My Lib

Abstract:

Haplotype is a pattern of single nucleotide polymorphisms (SNPs) on a single chromosome. Constructing a pair of haplotypes from aligned and overlapping but intermixed and erroneous fragments of the chromosomal sequences is a nontrivial problem. Minimum error correction approach aims to minimize the number of errors to be corrected so that the pair of haplotypes can be constructed through consensus of the fragments. We give a heuristic algorithm (HMEC) that searches through alternative solutions using a gain measure and stops whenever no better solution can be achieved. Time complexity of each iteration is for an SNP matrix where and are the number of fragments (number of rows) and number of SNP sites (number of columns), respectively, in an SNP matrix. Alternative gain measure is also given to reduce running time. We have compared our algorithm with other methods in terms of accuracy and running time on both simulated and real data, and our extensive experimental results indicate the superiority of our algorithm over others. 1. Introduction A single DNA molecule is a long chain of nucleotides (base pairs). There are four such nucleotides which are represented by the set of symbols . It is generally accepted that genomes of two humans are almost 99% identical at DNA level. However, at certain specific sites, variation is observed across the human population which is commonly known as “single nucleotide polymorphism” and abbreviated as “SNP” [1]. The nucleotide involved in a SNP site is called allele. If a SNP site can have only two nucleotides, it is called biallelic. If it can have more than two alleles it is called a multiallelic SNP [2]. From now on, we will consider the simplest case where only bi-allelic SNPs occur in a specific pair of DNA. The single nucleotide polymorphism (SNP) is believed to be the most widespread form of genetic variation [3]. The sequence of all SNPs in a given chromosome is called haplotype. Haplotyping an individual deals with determining a pair of haplotypes, one for each copy of a given chromosome. A chromosome is a complicated structure of a DNA molecule bound by proteins. This pair of haplotypes completely define the SNP fingerprints of an individual for a specific pair of chromosomes. Given the two sequences of bases, haplotyping is straight forward and just needs to iterate through both the sequences and remove all the common alleles from them. But haplotyping becomes difficult when we want to construct haplotypes from sequencing data for higher reliability. Sequencing data for a genome does not contain the complete

References

[1]  R. Cilibrasi, L. V. Iersel, S. Kelk, and J. Tromp, “On the complexity of several haplotyping problems,” in Proceedings of the 5th International Workshop on Algorithms in Bioinformatics, vol. 3692 of Lecture Notes in Computer Science, pp. 128–139, Springer, 2005.
[2]  P. Bonizzoni, G. Della Vedova, R. Dondi, and J. Li, “The haplotyping problem: an overview of computational models and solutions,” Journal of Computer Science and Technology, vol. 18, no. 6, pp. 675–688, 2003.
[3]  A. Chakravarti, “It's raining SNPs, hallelujah?” Nature Genetics, vol. 19, no. 3, pp. 216–217, 1998.
[4]  R. Lippert, R. Schwartz, G. Lancia, and S. Istrail, “Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem,” Briefings in Bioinformatics, vol. 3, no. 1, pp. 23–31, 2002.
[5]  R. S. Wang, L. Y. Wu, Z. P. Li, and X. S. Zhang, “Haplotype reconstruction from SNP fragments by minimum error correction,” Bioinformatics, vol. 21, no. 10, pp. 2456–2462, 2005.
[6]  V. Bansal and V. Bafna, “HapCUT: an efficient and accurate algorithm for the haplotype assembly problem,” Bioinformatics, vol. 24, no. 16, pp. i153–i159, 2008.
[7]  J. Duitama, T. Huebsch, G. McEwen, E. K. Suk, and M. R. Hoehe, “ReFHap: a reliable and fast algorithm for single individual haplotyping,” in Proceedings of the ACM International Conference on Bioinformatics and Computational Biology (ACM-BCB '10), pp. 160–169, August 2010.
[8]  C. M. Fiduccia and R. M. Mattheyses, “A linear time heuristics for improving network partitions,” in Proceedings of the 19th Design Automation Conference, pp. 175–181, 1982.
[9]  L. M. Genovese, F. Geraci, and M. Pellegrini, “SpeedHap: an accurate heuristic for the single individual SNP haplotyping problem with many gaps, high reading error rate, and low coverage,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 5, no. 4, pp. 492–502, 2008.
[10]  A. Panconesi and M. Sozio, “Fast hare: a fast heuristic for single individual SNP haplotype reconstruction,” Workshop on Algorithms in Bioinformatics, vol. 3240, pp. 266–277, 2004.
[11]  Y. Y. Zhao, L. Y. Wu, J. H. Zhang, R. S. Wang, and X. S. Zhang, “Haplotype assembly from aligned weighted SNP fragments,” Computational Biology and Chemistry, vol. 29, pp. 281–287, 2005.
[12]  Y. Wang, E. Feng, and R. Wang, “A clustering algorithm based on two distance functions for MEC model,” Computational Biology and Chemistry, vol. 31, no. 2, pp. 148–150, 2007.
[13]  Z. Chen, B. Fu, R. Schweller, B. Yang, Z. Zhao, and B. Zhu, “Linear time probabilistic algorithms for the singular haplotype reconstruction problem from SNP fragments,” Journal of Computational Biology, vol. 15, no. 5, pp. 535–546, 2008.
[14]  F. Geraci, “A comparison of several algorithms for the single individual SNP haplotyping reconstruction problem,” Bioinformatics, vol. 26, no. 18, Article ID btq411, pp. 2217–2225, 2010.
[15]  ReHap, http://bioalgo.iit.cnr.it/rehap/.
[16]  A. Al Mueen, M. S. Bayzid, M. M. Alam, and M. S. Rahman, “A heuristic algorithm for individual haplotyping with minimum error correction,” in Proceedings of the 1st International Conference on BioMedical Engineering and Informatics (BMEI '08), pp. 792–796, IEEE Computer Society Press, May 2008.
[17]  M. J. Rieder, S. L. Taylor, A. G. Clark, and D. A. Nickrson, “Sequence variation in the human angiotensin converting enzyme,” Nature Genetics, vol. 22, pp. 59–62, 1999.
[18]  M. J. Daly, J. D. Rioux, S. F. Schaffner, T. J. Hudson, and E. S. Lander, “High-resolution haplotype structure in the human genome,” Nature Genetics, vol. 29, no. 2, pp. 229–232, 2001.
[19]  S. Levy, G. Sutton, P. C. Ng, L Feuk, and A. L. Halpern, “The deploid genome sequence of an individual human,” PLOS Biology, vol. 5, no. 10, article e254, 2007.
[20]  X. S. Zhang, R. S. Wang, L. Y. Wu, and W. Zhang, “Minimum conflict individual haplotyping from SNP fragments and related genotype,” Evolutionary Bioinformatics, vol. 2, pp. 261–270, 2006.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133