全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Phylogenetic comparative assembly

DOI: 10.1186/1748-7188-5-3

Full-Text   Cite this paper   Add to My Lib

Abstract:

Here we propose an algorithm that takes several related genomes and their phylogenetic relationships into account to create a graph that contains the likelihood for each pair of contigs to be adjacent.Subsequently, this graph can be used to compute a layout graph that shows the most promising contig adjacencies in order to aid biologists in finishing the complete genomic sequence. The layout graph shows unique contig orderings where possible, and the best alternatives where necessary.Our new algorithm for contig ordering uses sequence similarity as well as phylogenetic information to estimate adjacencies of contigs. An evaluation of our implementation shows that it performs better than recent approaches while being much faster at the same time.Today the nucleotide sequences of many genomes are known. In the first genome projects, the process of obtaining the DNA sequence by multi-step clone-by-clone sequencing approaches was costly and tedious. Nowadays, the most common approach for de-novo genome sequencing is whole genome shotgun sequencing [1,2]. Here, the genome is fragmented randomly into small parts. Each of these fragments is sequenced, for example, with recent high throughput methods [3,4]. In the next step, overlapping reads are merged with an assembler software into a contiguous string. However, instead of the desired one sequence of the whole genome, often many contigs remain, separated by gaps. The main reasons for these gaps are lost fragments in the fragmentation phase and repeating sequences in the genome. In a process called scaffolding, the relative order of the contigs as well as the size of the gaps between them is estimated. In a subsequent finishing phase the gaps between the contigs are closed with a procedure called primer walking. For the ends of two estimated adjacent contigs, specific primer sequences have to be designed that function as start points for two polymerase chain reactions (PCRs) for Sanger sequencing [5]. These PCRs ideally run

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133