%0 Journal Article
%T HMEC: A Heuristic Algorithm for Individual Haplotyping with Minimum Error Correction
%A Md. Shamsuzzoha Bayzid
%A Md. Maksudul Alam
%A Abdullah Mueen
%A Md. Saidur Rahman
%J ISRN Bioinformatics
%D 2013
%R 10.1155/2013/291741
%X Haplotype is a pattern of single nucleotide polymorphisms (SNPs) on a single chromosome. Constructing a pair of haplotypes from aligned and overlapping but intermixed and erroneous fragments of the chromosomal sequences is a nontrivial problem. Minimum error correction approach aims to minimize the number of errors to be corrected so that the pair of haplotypes can be constructed through consensus of the fragments. We give a heuristic algorithm (HMEC) that searches through alternative solutions using a gain measure and stops whenever no better solution can be achieved. Time complexity of each iteration is for an SNP matrix where and are the number of fragments (number of rows) and number of SNP sites (number of columns), respectively, in an SNP matrix. Alternative gain measure is also given to reduce running time. We have compared our algorithm with other methods in terms of accuracy and running time on both simulated and real data, and our extensive experimental results indicate the superiority of our algorithm over others. 1. Introduction A single DNA molecule is a long chain of nucleotides (base pairs). There are four such nucleotides which are represented by the set of symbols . It is generally accepted that genomes of two humans are almost 99% identical at DNA level. However, at certain specific sites, variation is observed across the human population which is commonly known as “single nucleotide polymorphism” and abbreviated as “SNP” [1]. The nucleotide involved in a SNP site is called allele. If a SNP site can have only two nucleotides, it is called biallelic. If it can have more than two alleles it is called a multiallelic SNP [2]. From now on, we will consider the simplest case where only bi-allelic SNPs occur in a specific pair of DNA. The single nucleotide polymorphism (SNP) is believed to be the most widespread form of genetic variation [3]. The sequence of all SNPs in a given chromosome is called haplotype. Haplotyping an individual deals with determining a pair of haplotypes, one for each copy of a given chromosome. A chromosome is a complicated structure of a DNA molecule bound by proteins. This pair of haplotypes completely define the SNP fingerprints of an individual for a specific pair of chromosomes. Given the two sequences of bases, haplotyping is straight forward and just needs to iterate through both the sequences and remove all the common alleles from them. But haplotyping becomes difficult when we want to construct haplotypes from sequencing data for higher reliability. Sequencing data for a genome does not contain the complete
%U http://www.hindawi.com/journals/isrn.bioinformatics/2013/291741/