Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
Probabilistic Phylogenetic Inference with Insertions and Deletions  [PDF]
Elena Rivas ,Sean R. Eddy
PLOS Computational Biology , 2008, DOI: 10.1371/journal.pcbi.1000172
Abstract: A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time. However, the most widely used phylogenetic models only account for residue substitution events. We describe a probabilistic model of a multiple sequence alignment that accounts for insertion and deletion events in addition to substitutions, given a phylogenetic tree, using a rate matrix augmented by the gap character. Starting from a continuous Markov process, we construct a non-reversible generative (birth–death) evolutionary model for insertions and deletions. The model assumes that insertion and deletion events occur one residue at a time. We apply this model to phylogenetic tree inference by extending the program dnaml in phylip. Using standard benchmarking methods on simulated data and a new “concordance test” benchmark on real ribosomal RNA alignments, we show that the extended program dnamlε improves accuracy relative to the usual approach of ignoring gaps, while retaining the computational efficiency of the Felsenstein peeling algorithm.
Systematic analysis of insertions and deletions specific to nematode proteins and their proposed functional and evolutionary relevance
Zhengyuan Wang, John Martin, Sahar Abubucker, Yong Yin, Robin B Gasser, Makedonka Mitreva
BMC Evolutionary Biology , 2009, DOI: 10.1186/1471-2148-9-23
Abstract: Amino acid alterations in sequences of nematodes were identified by comparison with homologous sequences from a wide range of eukaryotic (metzoan) organisms. This comparison revealed that the proteins inferred from transcriptomic datasets for nematodes contained more deletions than insertions, and that the deletions tended to be larger in length than insertions, indicating a decreased size of the transcriptome of nematodes compared with other organisms. The present findings showed that this reduction is more pronounced in parasitic nematodes compared with the free-living nematodes of the genus Caenorhabditis. Consistent with a requirement for conservation in proteins involved in the processing of genetic information, fewer insertions and deletions were detected in such proteins. On the other hand, more insertions and deletions were recorded for proteins inferred to be involved in the endocrine and immune systems, suggesting a link with adaptation. Similarly, proteins involved in multiple cellular pathways tended to display more deletions and insertions than those involved in a single pathway. The number of insertions and deletions shared by a range of plant parasitic nematodes were higher for proteins involved in lipid metabolism and electron transport compared with other nematodes, suggesting an association between metabolic adaptation and parasitism in plant hosts. We also identified three sizable deletions from proteins found to be specific to and shared by parasitic nematodes, which, given their uniqueness, might serve as target candidates for drug design.This study illustrates the significance of using comparative genomics approaches to identify molecular elements unique to parasitic nematodes, which have adapted to a particular host organism and mode of existence during evolution. While the focus of this study was on nematodes, the approach has applicability to a wide range of other groups of organisms.Novel molecular signatures specific to particular taxonomi
A Macaque's-Eye View of Human Insertions and Deletions: Differences in Mechanisms  [PDF]
Erika M Kvikstad,Svitlana Tyekucheva,Francesca Chiaromonte,Kateryna D Makova
PLOS Computational Biology , 2007, DOI: 10.1371/journal.pcbi.0030176
Abstract: Insertions and deletions (indels) cause numerous genetic diseases and lead to pronounced evolutionary differences among genomes. The macaque sequences provide an opportunity to gain insights into the mechanisms generating these mutations on a genome-wide scale by establishing the polarity of indels occurring in the human lineage since its divergence from the chimpanzee. Here we apply novel regression techniques and multiscale analyses to demonstrate an extensive regional indel rate variation stemming from local fluctuations in divergence, GC content, male and female recombination rates, proximity to telomeres, and other genomic factors. We find that both replication and, surprisingly, recombination are significantly associated with the occurrence of small indels. Intriguingly, the relative inputs of replication versus recombination differ between insertions and deletions, thus the two types of mutations are likely guided in part by distinct mechanisms. Namely, insertions are more strongly associated with factors linked to recombination, while deletions are mostly associated with replication-related features. Indel as a term misleadingly groups the two types of mutations together by their effect on a sequence alignment. However, here we establish that the correct identification of a small gap as an insertion or a deletion (by use of an outgroup) is crucial to determining its mechanism of origin. In addition to providing novel insights into insertion and deletion mutagenesis, these results will assist in gap penalty modeling and eventually lead to more reliable genomic alignments.
Achievable Rates for Channels with Deletions and Insertions  [PDF]
Ramji Venkataramanan,Sekhar Tatikonda,Kannan Ramchandran
Mathematics , 2011, DOI: 10.1109/TIT.2013.2278181
Abstract: This paper considers a binary channel with deletions and insertions, where each input bit is transformed in one of the following ways: it is deleted with probability d, or an extra bit is added after it with probability i, or it is transmitted unmodified with probability 1-d-i. A computable lower bound on the capacity of this channel is derived. The transformation of the input sequence by the channel may be viewed in terms of runs as follows: some runs of the input sequence get shorter/longer, some runs get deleted, and some new runs are added. It is difficult for the decoder to synchronize the channel output sequence to the transmitted codeword mainly due to deleted runs and new inserted runs. The main idea is a mutual information decomposition in terms of the rate achieved by a sub-optimal decoder that determines the positions of the deleted and inserted runs in addition to decoding the transmitted codeword. The mutual information between the channel input and output sequences is expressed as the sum of the rate achieved by this decoder and the rate loss due to its sub-optimality. Obtaining computable lower bounds on each of these quantities yields a lower bound on the capacity. The bounds proposed in this paper provide the first characterization of achievable rates for channels with general insertions, and for channels with both deletions and insertions. For the special case of the deletion channel, the proposed bound improves on the previous best lower bound for deletion probabilities up to 0.3.
Evolution models with base substitutions, insertions, deletions and selection  [PDF]
D. B. Saakian
Quantitative Biology , 2009, DOI: 10.1103/PhysRevE.78.061920
Abstract: The evolution model with parallel mutation-selection scheme is solved for the case when selection is accompanied by base substitutions, insertions, and deletions. The fitness is assumed to be either a single-peak function (i.e., having one finite discontinuity) or a smooth function of the Hamming distance from the reference sequence. The mean fitness is exactly calculated in large-genome limit. In the case of insertions and deletions the evolution characteristics depend on the choice of reference sequence.
Sequence context affects the rate of short insertions and deletions in flies and primates
Amos Tanay, Eric D Siggia
Genome Biology , 2008, DOI: 10.1186/gb-2008-9-2-r37
Abstract: Here we analyze a large collection of high confidence short insertions and deletions in primates and flies, revealing extensive correlations between sequence context and indel rates and building principled models for predicting these rates from sequence. According to our results, the rate of insertion or deletion of specific lengths can vary by more than 100-fold, depending on the surrounding sequence. These mutational biases can strongly influence the composition of the genome and the rate at which particular sequences appear. We exemplify this by showing how degenerate loci in human exons are selected to reduce their frame shifting indel propensity.Insertions and deletions are strongly affected by sequence context. Consequentially, genomes must adapt to significant variation in the mutational input at indel-prone and indel-immune loci.The evolution of genomes is driven by an influx of mutations that are subject to a stochastic process of neutral fixation and to multiple selective pressures that can change the neutral fixation dynamics. Good understanding of the evolutionary process requires characterization of both the mutational and fixation processes. This is particularly important in applications that try to reveal genomic loci that are evolving under selection by looking for slowly or rapidly evolving sequences. In such studies one has to make sure the mutational input at the genomic regions under study is not abnormally high or low [1-4], or else the inferred selection may be an artifact of the mutational dynamics and not a true indication for a functional constraint on the sequence. Changes are introduced into genomes through point mutations, insertions and deletions. The dynamics of each of these mechanisms may vary according to genomic context and the presence of various factors acting in trans.Before the availability of numerous fully sequenced genomes, evolutionary studies focused on two extremes: replacements of entire genes and chromosome domains or po
List decoding subspace codes from insertions and deletions  [PDF]
Venkatesan Guruswami,Srivatsan Narayanan,Carol Wang
Mathematics , 2012,
Abstract: We present a construction of subspace codes along with an efficient algorithm for list decoding from both insertions and deletions, handling an information-theoretically maximum fraction of these with polynomially small rate. Our construction is based on a variant of the folded Reed-Solomon codes in the world of linearized polynomials, and the algorithm is inspired by the recent linear-algebraic approach to list decoding. Ours is the first list decoding algorithm for subspace codes that can handle deletions; even one deletion can totally distort the structure of the basis of a subspace and is thus challenging to handle. When there are only insertions, we also present results for list decoding subspace codes that are the linearized analog of Reed-Solomon codes (proposed previously, and closely related to the Gabidulin codes for rank-metric), obtaining some improvements over similar results in previous work.
The rates and patterns of insertions, deletions and substitutions in mouse and rat inferred from introns
YanHui Fan,Qi Shi,JinFeng Chen,WenJuan Wang,HongXia Pang,JiaoWei Tang,ShiHeng Tao
Chinese Science Bulletin , 2008, DOI: 10.1007/s11434-008-0352-z
Abstract: The rates and patterns of InDel (insertions and deletions) and substitution in rodent (mouse and rat) have been studied. The result reveals that deletions occur more frequently than insertions, and single nucleotide insertion and deletion are the most frequent in both mouse and rat. The frequencies of both deletions and insertions decrease rapidly with increasing InDels length, and the size distributions of both insertions and deletions can be described well by power-law. There are more AT→GC than GC → AT substitutions in the introns of rat. However, there are more GC→AT than AT→GC substitutions in the introns in mouse. The deletion bias found in introns in mouse and rat supports the prediction that intron insertions are more deleterious than deletions because of reduced transcription and splicing efficiency. The patterns of substitution suggest that both composition and GC content are not equilibrium in the introns in rodents.
Characterisation and Validation of Insertions and Deletions in 173 Patient Exomes  [PDF]
Francesco Lescai, Silvia Bonfiglio, Chiara Bacchelli, Estelle Chanudet, Aoife Waters, Sanjay M. Sisodiya, Dalia Kasperavi?iūt?, Julie Williams, Denise Harold, John Hardy, Robert Kleta, Sebahattin Cirak, Richard Williams, John C. Achermann, John Anderson, David Kelsell, Tom Vulliamy, Henry Houlden, Nicholas Wood, Una Sheerin, Gian Paolo Tonini, Donna Mackay, Khalid Hussain, Jane Sowden, Veronica Kinsler, Justyna Osinska, Tony Brooks, Mike Hubank, Philip Beales, Elia Stupka
PLOS ONE , 2012, DOI: 10.1371/journal.pone.0051292
Abstract: Recent advances in genomics technologies have spurred unprecedented efforts in genome and exome re-sequencing aiming to unravel the genetic component of rare and complex disorders. While in rare disorders this allowed the identification of novel causal genes, the missing heritability paradox in complex diseases remains so far elusive. Despite rapid advances of next-generation sequencing, both the technology and the analysis of the data it produces are in its infancy. At present there is abundant knowledge pertaining to the role of rare single nucleotide variants (SNVs) in rare disorders and of common SNVs in common disorders. Although the 1,000 genome project has clearly highlighted the prevalence of rare variants and more complex variants (e.g. insertions, deletions), their role in disease is as yet far from elucidated. We set out to analyse the properties of sequence variants identified in a comprehensive collection of exome re-sequencing studies performed on samples from patients affected by a broad range of complex and rare diseases (N = 173). Given the known potential for Loss of Function (LoF) variants to be false positive, we performed an extensive validation of the common, rare and private LoF variants identified, which indicated that most of the private and rare variants identified were indeed true, while common novel variants had a significantly higher false positive rate. Our results indicated a strong enrichment of very low-frequency insertion/deletion variants, so far under-investigated, which might be difficult to capture with low coverage and imputation approaches and for which most of study designs would be under-powered. These insertions and deletions might play a significant role in disease genetics, contributing specifically to the underlining rare and private variation predicted to be discovered through next generation sequencing.
File Updates Under Random/Arbitrary Insertions And Deletions  [PDF]
Qiwen Wang,Viveck Cadambe,Sidharth Jaggi,Moshe Schwartz,Muriel Médard
Mathematics , 2015,
Abstract: A client/encoder edits a file, as modeled by an insertion-deletion (InDel) process. An old copy of the file is stored remotely at a data-centre/decoder, and is also available to the client. We consider the problem of throughput- and computationally-efficient communication from the client to the data-centre, to enable the server to update its copy to the newly edited file. We study two models for the source files/edit patterns: the random pre-edit sequence left-to-right random InDel (RPES-LtRRID) process, and the arbitrary pre-edit sequence arbitrary InDel (APES-AID) process. In both models, we consider the regime in which the number of insertions/deletions is a small (but constant) fraction of the original file. For both models we prove information-theoretic lower bounds on the best possible compression rates that enable file updates. Conversely, our compression algorithms use dynamic programming (DP) and entropy coding, and achieve rates that are approximately optimal.
Page 1 /100
Display every page Item

Copyright © 2008-2017 Open Access Library. All rights reserved.