oalib

Publish in OALib Journal

ISSN: 2333-9721

APC: Only $99

Submit

Any time

2019 ( 226 )

2018 ( 310 )

2017 ( 332 )

2016 ( 430 )

Custom range...

Search Results: 1 - 10 of 220762 matches for " Peter D. Karp "
All listed articles are free for downloading (OA Articles)
Page 1 /220762
Display every page Item
Many Genbank Entries for Complete Microbial Genomes Violate the Genbank Standard
Peter D. Karp
Comparative and Functional Genomics , 2001, DOI: 10.1002/cfg.63
Abstract: A survey of Genbank entries for complete microbial genomes reveals that the majority do not conform to the Genbank standard. Typical deviations from the Genbank standard include records with information in incorrect fields, addition of extraneous and confusing information within a field, and omission of useful fields. This situation results from two principal causes: genome centres do not submit Genbank records in the proper form and the Genbank, EMBL and DDBJ staffs do not enforce the database standards that they have defined.
Call for an enzyme genomics initiative
Peter D Karp
Genome Biology , 2004, DOI: 10.1186/gb-2004-5-8-401
Abstract: A recent essay by Roberts [1] called for an effort by the scientific community to experimentally determine functions for unidentified genes in microbial genomes. Put another way, the essay focused on sequences with no associated function. Here, I explore the inverse problem: functions with no associated sequence. I propose an Enzyme Genomics project whose goal is to find at least one amino-acid sequence for every biochemically characterized enzyme activity for which there is currently no known sequence.Roberts identifies three classes of genes whose functions would be most valuable to obtain: hypothetical genes with homologs in multiple organisms (conserved hypothetical), non-conserved hypothetical genes, and misannotated genes. Roberts proposes that a consortium of bioinformaticians post functional predictions for these genes to a central website. Biologists would then choose candidates and test the predicted functions in the lab, with results - both positive and negative - added to the same website. Roberts also proposes that the initial list of target genes be chosen from an experimentally tractable organism such as Escherichia coli, with the recognition that some experiments might be performed on homologs from other organisms.My proposal for an Enzyme Genomics Initiative is based on a different part of the gap between genomics and biochemical function, and I suggest it as a fourth priority area in addition to the three suggested by Roberts. Elucidation of protein sequences corresponding to enzyme activities is important because of the many applications of metabolic enzymes in areas ranging from metabolic engineering to antimicrobial drug discovery to metabolic diseases. Finding enzyme sequences may also be easier than the projects listed by Roberts, because in many cases significant biochemical knowledge about these enzymes (such as purification procedures and assays) is already in hand.Consider two implications of the many characterized enzymes for which no seq
Web-based metabolic network visualization with a zooming user interface
Mario Latendresse, Peter D Karp
BMC Bioinformatics , 2011, DOI: 10.1186/1471-2105-12-176
Abstract: We present a Web-based metabolic-map diagram, which can be interactively explored by the user, called the Cellular Overview. The main characteristic of this application is the zooming user interface enabling the user to focus on appropriate granularities of the network at will. Various searching commands are available to visually highlight sets of reactions, pathways, enzymes, metabolites, and so on. Expression data from single or multiple experiments can be overlaid on the diagram, which we call the Omics Viewer capability. The application provides Web services to highlight the diagram and to invoke the Omics Viewer. This application is entirely written in JavaScript for the client browsers and connect to a Pathway Tools Web server to retrieve data and diagrams. It uses the OpenLayers library to display tiled diagrams.This new online tool is capable of displaying large and complex metabolic-map diagrams in a very interactive manner. This application is available as part of the Pathway Tools software that powers multiple metabolic databases including Biocyc.org: The Cellular Overview is accessible under the Tools menu.Web-based applications using a zoomable user interface (ZUI) are becoming a familiar approach to allowing people to comprehend complex information spaces. Examples include zoomable genome browsers [1,2] and Google Maps. The basic user interaction these tools provide are altering the magnification of a large two-dimensional diagram (often with semantic zooming, meaning new details are displayed at higher magnifications), panning, and searching for points of interest.Metabolic network models are now available for hundreds of organisms due to the high rate of genome sequencing and the availability of software tools for reconstructing metabolic network models from genome sequence information [2-5]. We aim to provide scientists with tools for understanding, exploring, and exploiting metabolic reconstructions.The Pathway Tools software has had a metabolic ne
A survey of orphan enzyme activities
Yannick Pouliot, Peter D Karp
BMC Bioinformatics , 2007, DOI: 10.1186/1471-2105-8-244
Abstract: We demonstrate that for ~80% of sampled orphans, the absence of sequence data is bona fide. Our analyses further substantiate the notion that many of these enzyme activities play biologically important roles.This survey points toward significant scientific cost of having such a large fraction of characterized enzyme activities disconnected from sequence data. It also suggests that a larger effort, beginning with a comprehensive survey of all putative orphan activities, would resolve nearly 300 artifactual orphans and reconnect a wealth of enzyme research with modern genomics. For these reasons, we propose that a systematic effort to identify the cognate genes of orphan enzymes be undertaken.After a decade of comprehensive genomic sequencing, more than 500 genomes have been sequenced to completion, mostly prokaryotes. The prodigious rate of new sequence annotation is highlighted by the fact that there were just over 300 genomes available when this study was carried out in late 2004. However, the fraction of genes for which no function can be predicted remains high (30%–50%). In response, proposals have been put forth for the bioinformatics analysis of bacterial genomes to identify genes with high likelihood of scoring true in confirmatory laboratory assays of their respective function [1,2]. This would increase the field's pool of experimentally characterized proteins, with concomitant increases in the accuracy and coverage of genome annotation. We believe the return on investment of this approach would be particularly high when addressing the problem of orphan activities, that is, enzymatic activities for which no sequence information is available [3,4].Decades of detailed enzymology have created a wealth of knowledge about enzymes and their activities. However, crucial aspects of these enzymes are absent from bioinformatics databases with surprising frequency. For example, recent computational analyses of sequence databases demonstrate that at least 36% of enzyme a
The past, present and future of genome-wide re-annotation
Christos A Ouzounis, Peter D Karp
Genome Biology , 2002, DOI: 10.1186/gb-2002-3-2-comment2001
Abstract: " It is perhaps hard to make firm statements on such questions without having examined them many times Aristotle, Categories, 8b21 (translated by J.L Ackrill, Clarendon Press, Oxford 1963) "Over the past ten years, we have witnessed the publication of several chromosomes or complete genome sequences from a variety of bacterial, archaeal and eukaryotic species. The trend towards genome sequencing is expected to continue or even accelerate in the foreseeable future. The wealth of sequence information being produced has generated the need for rapid annotation and subsequent biological interpretation of genome sequences. Annotation can be defined as a process by which structural or functional information is inferred for genes or proteins, usually on the basis of similarity to previously characterized sequences in public databases. The annotation process associates genome sequences with functional information and guides experimentation by relating genotypes to phenotypic properties.Once a genome-sequencing project is completed and the information is released into the public domain, it is common practice for certain groups of researchers to take a 'second look' at the original annotation, for various reasons. We define the process of annotating a previously annotated genome sequence as 're-annotation'. Motivations for re-annotation include discovery of more genes and protein functions, testing and performance-comparison of existing or newly developed annotation methods, and assessment of annotation reproducibility. Re-annotation also provides up-to-date information for end-users, using the latest resources - such as new or improved algorithms and richer databases.Clearly, the drive for re-annotation goes back in time, arising even before the availability of entire genome sequences. For example, in an attempt to assign function to a number of uncharacterized, hypothetical genes from archaeal species, one of the earliest large-scale re-annotation studies produced a number o
A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases
Michelle L Green, Peter D Karp
BMC Bioinformatics , 2004, DOI: 10.1186/1471-2105-5-76
Abstract: We have developed a method that efficiently combines homology and pathway-based evidence to identify candidates for filling pathway holes in Pathway/Genome databases. Our program not only identifies potential candidate sequences for pathway holes, but combines data from multiple, heterogeneous sources to assess the likelihood that a candidate has the required function. Our algorithm emulates the manual sequence annotation process, considering not only evidence from homology searches, but also considering evidence from genomic context (i.e., is the gene part of an operon?) and functional context (e.g., are there functionally-related genes nearby in the genome?) to determine the posterior belief that a candidate has the required function. The method can be applied across an entire metabolic pathway network and is generally applicable to any pathway database. The program uses a set of sequences encoding the required activity in other genomes to identify candidate proteins in the genome of interest, and then evaluates each candidate by using a simple Bayes classifier to determine the probability that the candidate has the desired function. We achieved 71% precision at a probability threshold of 0.9 during cross-validation using known reactions in computationally-predicted pathway databases. After applying our method to 513 pathway holes in 333 pathways from three Pathway/Genome databases, we increased the number of complete pathways by 42%. We made putative assignments to 46% of the holes, including annotation of 17 sequences of previously unknown function.Our pathway hole filler can be used not only to increase the utility of Pathway/Genome databases to both experimental and computational researchers, but also to improve predictions of protein function.Genome sequencing projects generate large numbers of nucleotide sequences each year [1]. Once the sequences are obtained, functions must be assigned to these new sequences. This is typically accomplished by searching lar
A systematic study of genome context methods: calibration, normalization and combination
Luciana Ferrer, Joseph M Dale, Peter D Karp
BMC Bioinformatics , 2010, DOI: 10.1186/1471-2105-11-493
Abstract: We present a thorough study of the four main families of genome context methods found in the literature: phylogenetic profile, gene fusion, gene cluster, and gene neighbor. We find that for most organisms the gene neighbor method outperforms the phylogenetic profile method by as much as 40% in sensitivity, being competitive with the gene cluster method at low sensitivities. Gene fusion is generally the worst performing of the four methods. A thorough exploration of the parameter space for each method is performed and results across different target organisms are presented.We propose the use of normalization procedures as those used on microarray data for the genome context scores. We show that substantial gains can be achieved from the use of a simple normalization technique. In particular, the sensitivity of the phylogenetic profile method is improved by around 25% after normalization, resulting, to our knowledge, on the best-performing phylogenetic profile system in the literature.Finally, we show results from combining the various genome context methods into a single score. When using a cross-validation procedure to train the combiners, with both original and normalized scores as input, a decision tree combiner results in gains of up to 20% with respect to the gene neighbor method. Overall, this represents a gain of around 15% over what can be considered the state of the art in this area: the four original genome context methods combined using a procedure like that used in the STRING database. Unfortunately, we find that these gains disappear when the combiner is trained only with organisms that are phylogenetically distant from the target organism.Our experiments indicate that gene neighbor is the best individual genome context method and that gains from the combination of individual methods are very sensitive to the training data used to obtain the combiner's parameters. If adequate training data is not available, using the gene neighbor score by itself instead
Machine learning methods for metabolic pathway prediction
Joseph M Dale, Liviu Popescu, Peter D Karp
BMC Bioinformatics , 2010, DOI: 10.1186/1471-2105-11-15
Abstract: To quantitatively validate methods for pathway prediction, we developed a large "gold standard" dataset of 5,610 pathway instances known to be present or absent in curated metabolic pathway databases for six organisms. We defined a collection of 123 pathway features, whose information content we evaluated with respect to the gold standard. Feature data were used as input to an extensive collection of machine learning (ML) methods, including na?ve Bayes, decision trees, and logistic regression, together with feature selection and ensemble methods. We compared the ML methods to the previous PathoLogic algorithm for pathway prediction using the gold standard dataset. We found that ML-based prediction methods can match the performance of the PathoLogic algorithm. PathoLogic achieved an accuracy of 91% and an F-measure of 0.786. The ML-based prediction methods achieved accuracy as high as 91.2% and F-measure as high as 0.787. The ML-based methods output a probability for each predicted pathway, whereas PathoLogic does not, which provides more information to the user and facilitates filtering of predicted pathways.ML methods for pathway prediction perform as well as existing methods, and have qualitative advantages in terms of extensibility, tunability, and explainability. More advanced prediction methods and/or more sophisticated input features may improve the performance of ML methods. However, pathway prediction performance appears to be limited largely by the ability to correctly match enzymes to the reactions they catalyze based on genome annotations.A key step toward understanding an organism's metabolism is the construction of a comprehensive model of the network of metabolic reactions taking place in the organism. Although a number of such models have been constructed through painstaking literature-based manual curation [1,2], such an approach obviously cannot scale to hundreds of sequenced genomes. Therefore, methods are needed for computational characterization
Regulatory network operations in the Pathway Tools software
Paley Suzanne M,Latendresse Mario,Karp Peter D
BMC Bioinformatics , 2012, DOI: 10.1186/1471-2105-13-243
Abstract: Background Biologists are elucidating complex collections of genetic regulatory data for multiple organisms. Software is needed for such regulatory network data. Results The Pathway Tools software supports storage and manipulation of regulatory information through a variety of strategies. The Pathway Tools regulation ontology captures transcriptional and translational regulation, substrate-level regulation of enzyme activity, post-translational modifications, and regulatory pathways. Regulatory visualizations include a novel diagram that summarizes all regulatory influences on a gene; a transcription-unit diagram, and an interactive visualization of a full transcriptional regulatory network that can be painted with gene expression data to probe correlations between gene expression and regulatory mechanisms. We introduce a novel type of enrichment analysis that asks whether a gene-expression dataset is over-represented for known regulators. We present algorithms for ranking the degree of regulatory influence of genes, and for computing the net positive and negative regulatory influences on a gene. Conclusions Pathway Tools provides a comprehensive environment for manipulating molecular regulatory interactions that integrates regulatory data with an organism’s genome and metabolic network. Curated collections of regulatory data authored using Pathway Tools are available for Escherichia coli, Bacillus subtilis, and Shewanella oneidensis.
Capacity of a condenser whose plates are circular arcs
D. Karp
Mathematics , 2006,
Abstract: We find an asymptotic formula for the conformal capacity of a plane condenser both plate of which are concentric circular arcs as the distance between them vanishes. This result generalizes the formula for the capacity of parallel linear plate condenser found by Simonenko and Chekulaeva in 1972 and sheds light on the problem of finding an asymptotic formula for the capacity of condenser whose plates are arbitrary parallel curves. This problem was posed and partially solved by R. K\"{u}hnau in 1998.
Page 1 /220762
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.