Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
Finding Communities of Related Genes  [PDF]
Dennis Wilkinson,Bernardo A. Huberman
Physics , 2002,
Abstract: We present an automated method of identifying communities of functionally related genes from the biomedical literature. These communities encapsulate human gene and protein interactions and identify groups of genes that are complementary in their function. We use graphs to represent the network of gene cooccurrences in articles mentioning particular keywords, and find that these graphs consist of one giant connected component and many small ones. In addition, the vertex degree distribution of the graphs follows a power law, whose exponent we determine. We then use an algorithm based on betweenness centrality to identify community structures within the giant component. The different structures are then aggregated into a final list of communities, whose members are weighted according to how strongly they belong to them. Our method is efficient enough to be applicable to the entire Medline database, and yet the information it extracts is significantly detailed, applicable to a particular problem, and interesting in and of itself. We illustrate the method in the case of colon cancer and demonstrate important features of the resulting communities.
Finding the Core-Genes of Chloroplasts  [PDF]
Bassam AlKindy,Jean-Fran?ois Couchot,Christophe Guyeux,Arnaud Mouly,Michel Salomon,Jacques M. Bahi
Computer Science , 2014,
Abstract: Due to the recent evolution of sequencing techniques, the number of available genomes is rising steadily, leading to the possibility to make large scale genomic comparison between sets of close species. An interesting question to answer is: what is the common functionality genes of a collection of species, or conversely, to determine what is specific to a given species when compared to other ones belonging in the same genus, family, etc. Investigating such problem means to find both core and pan genomes of a collection of species, \textit{i.e.}, genes in common to all the species vs. the set of all genes in all species under consideration. However, obtaining trustworthy core and pan genomes is not an easy task, leading to a large amount of computation, and requiring a rigorous methodology. Surprisingly, as far as we know, this methodology in finding core and pan genomes has not really been deeply investigated. This research work tries to fill this gap by focusing only on chloroplastic genomes, whose reasonable sizes allow a deep study. To achieve this goal, a collection of 99 chloroplasts are considered in this article. Two methodologies have been investigated, respectively based on sequence similarities and genes names taken from annotation tools. The obtained results will finally be evaluated in terms of biological relevance.
Finding disease candidate genes by liquid association
Ker-Chau Li, Aarno Palotie, Shinsheng Yuan, Denis Bronnikov, Daniel Chen, Xuelian Wei, Oi-Wa Choi, Janna Saarela, Leena Peltonen
Genome Biology , 2007, DOI: 10.1186/gb-2007-8-10-r205
Abstract: Studies aiming to identify susceptibility genes in complex diseases have proceeded along two lines. The traditional candidate gene approach is limited by our ability to come up with a comprehensive list of biologically related genes. On the other hand, the 'hypothesis free' approach relies on genome-wide scans for disease loci, typically via linkage in exceptionally large families or via association in case control studies. Multiple sclerosis (MS), which is one of the most common neurologic disorders affecting young adults, is characterized by demyelination and reactive gliosis [1]. Analogous to many complex traits, genome scans in MS have identified numerous chromosomal loci often with only a nominal evidence for linkage to MS [2-6]. With the notable exception of the human leukocyte antigen (major histocompatibility complex [MHC]) locus on 6p21, evidence for specific MS genes emerging from these studies is still scanty. Thus far, the only associated non-HLA genes replicated in multiple populations are the PRKCA gene [7] and the recently reported IL2RA and IL7R genes [8]. For MS, as for most complex traits, the loci derived from linkage scans have remained quite wide because of multiple uncertainties concerning the disease model in statistical analyses. To expedite the process of gene identification in these wide DNA regions, we need novel approaches to identify potentially involved pathways and to prioritize genes on identified loci for further sequencing efforts.Our idea is to turn to full genome functional studies for these goals. As illustrated in Figure 1, our approach takes advantage of the availability of abundant microarray data and a wealth of genomic/proteomic knowledge base from the public domain. Our intention is to integrate information from both the candidate gene and the full genome scan (thus far mostly family-based linkage) approaches. In this report we use two previously reported MS susceptibility genes, identified in the same study sample [7,9], n
An Integrated Approach for Finding Overlooked Genes in Shigella  [PDF]
Junping Peng,Jian Yang,Qi Jin
PLOS ONE , 2012, DOI: 10.1371/journal.pone.0018509
Abstract: The completion of numerous genome sequences introduced an era of whole-genome study. However, many genes are missed during genome annotation, including small RNAs (sRNAs) and small open reading frames (sORFs). In order to improve genome annotation, we aimed to identify novel sRNAs and sORFs in Shigella, the principal etiologic agents of bacillary dysentery.
The Cyclohedron Test for Finding Periodic Genes in Time Course Expression Studies  [PDF]
Jason Morton,Lior Pachter,Anne Shiu,Bernd Sturmfels
Quantitative Biology , 2007,
Abstract: The problem of finding periodically expressed genes from time course microarray experiments is at the center of numerous efforts to identify the molecular components of biological clocks. We present a new approach to this problem based on the cyclohedron test, which is a rank test inspired by recent advances in algebraic combinatorics. The test has the advantage of being robust to measurement errors, and can be used to ascertain the significance of top-ranked genes. We apply the test to recently published measurements of gene expression during mouse somitogenesis and find 32 genes that collectively are significant. Among these are previously identified periodic genes involved in the Notch/FGF and Wnt signaling pathways, as well as novel candidate genes that may play a role in regulating the segmentation clock. These results confirm that there are an abundance of exceptionally periodic genes expressed during somitogenesis. The emphasis of this paper is on the statistics and combinatorics that underlie the cyclohedron test and its implementation within a multiple testing framework.
Finding Protein-Coding Genes through Human Polymorphisms  [PDF]
Edward Wijaya, Martin C. Frith, Paul Horton, Kiyoshi Asai
PLOS ONE , 2013, DOI: 10.1371/journal.pone.0054210
Abstract: Human gene catalogs are fundamental to the study of human biology and medicine. But they are all based on open reading frames (ORFs) in a reference genome sequence (with allowance for introns). Individual genomes, however, are polymorphic: their sequences are not identical. There has been much research on how polymorphism affects previously-identified genes, but no research has been done on how it affects gene identification itself. We computationally predict protein-coding genes in a straightforward manner, by finding long ORFs in mRNA sequences aligned to the reference genome. We systematically test the effect of known polymorphisms with this procedure. Polymorphisms can not only disrupt ORFs, they can also create long ORFs that do not exist in the reference sequence. We found 5,737 putative protein-coding genes that do not exist in the reference, whose protein-coding status is supported by homology to known proteins. On average 10% of these genes are located in the genomic regions devoid of annotated genes in 12 other catalogs. Our statistical analysis showed that these ORFs are unlikely to occur by chance.
Bayesian Nonparametric Variable Selection as an Exploratory Tool for Finding Genes that Matter  [PDF]
Babak Shahbaba
Quantitative Biology , 2010,
Abstract: High-throughput scientific studies involving no clear a'priori hypothesis are common. For example, a large-scale genomic study of a disease may examine thousands of genes without hypothesizing that any specific gene is responsible for the disease. In these studies, the objective is to explore a large number of possible factors (e.g. genes) in order to identify a small number that will be considered in follow-up studies that tend to be more thorough and on smaller scales. For large-scale studies, we propose a nonparametric Bayesian approach based on random partition models. Our model thus divides the set of candidate factors into several subgroups according to their degrees of relevance, or potential effect, in relation to the outcome of interest. The model allows for a latent rank to be assigned to each factor according to the overall potential importance of its corresponding group. The posterior expectation or mode of these ranks is used to set up a threshold for selecting potentially relevant factors. Using simulated data, we demonstrate that our approach could be quite effective in finding relevant genes compared to several alternative methods. We apply our model to two large-scale studies. The first study involves transcriptome analysis of infection by human cytomegalovirus (HCMV). The objective of the second study is to identify differentially expressed genes between two types of leukemia.
Re-annotation of genome microbial CoDing-Sequences: finding new genes and inaccurately annotated genes
Stéphanie Bocs, Antoine Danchin, Claudine Médigue
BMC Bioinformatics , 2002, DOI: 10.1186/1471-2105-3-5
Abstract: We have developed a new program that automatically identifies biologically significant candidate genes in a bacterial genome. Twenty-six complete prokaryotic genomes were analyzed using this tool, and the accuracy of gene finding was assessed by comparison with existing annotations. This analysis revealed that, despite the enormous effort of genome program annotators, a small but not negligible number of genes annotated within the framework of sequencing projects are likely to be partially inaccurate or plainly wrong. Moreover, the analysis of several putative new genes shows that, as expected, many short genes have escaped annotation. In most cases, these new genes revealed frameshifts that could be either artifacts or genuine frameshifts. Some entirely unexpected new genes have also been identified. This allowed us to get a more complete picture of prokaryotic genomes. The results of this procedure are progressively integrated into the SWISS-PROT reference databank.The results described in the present study show that our procedure is very satisfactory in terms of gene finding accuracy. Except in few cases, discrepancies between our results and annotations provided by individual authors can be accounted for by the nature of each annotation process or by specific characteristics of some genomes. This stresses that close cooperation between scientists, regular update and curation of the findings in databases are clearly required to reduce the level of errors in genome annotation (and hence in reducing the unfortunate spreading of errors through centralized data libraries).The main goal of large-scale genome sequencing projects is to obtain new insights into physiological and biological processes underlying the very organization of life. An essential step in this quest is gene identification, with subsequent functional annotation of the corresponding gene products. Gene recognition in bacteria is far from being always straightforward, despite the fact that bacterial g
Efficiency Analysis of Competing Tests for Finding Differentially Expressed Genes in Lung Adenocarcinoma
Rick Jordan,Satish Patel,Hai Hu,James Lyons-Weiler
Cancer Informatics , 2008,
Abstract: In this study, we introduce and use Efficiency Analysis to compare differences in the apparent internal and external consistency of competing normalization methods and tests for identifying differentially expressed genes. Using publicly available data, two lung adenocarcinoma datasets were analyzed using caGEDA (http://bioinformatics2.pitt.edu/GE2/GEDA. html) to measure the degree of differential expression of genes existing between two populations. The datasets were randomly split into at least two subsets, each analyzed for differentially expressed genes between the two sample groups, and the gene lists compared for overlapping genes. Efficiency Analysis is an intuitive method that compares the differences in the percentage of overlap of genes from two or more data subsets, found by the same test over a range of testing methods. Tests that yield consistent gene lists across independently analyzed splits are preferred to those that yield less consistent inferences. For example, a method that exhibits 50% overlap in the 100 top genes from two studies should be preferred to a method that exhibits 5% overlap in the top 100 genes. The same procedure was performed using all available normalization and transformation methods that are available through caGEDA. The ‘best’ test was then further evaluated using internal cross-validation to estimate generalizable sample classification errors using a Na ve Bayes classification algorithm. A novel test, termed D1 (a derivative of the J5 test) was found to be the most consistent, and to exhibit the lowest overall classification error, and highest sensitivity and specificity. The D1 test relaxes the assumption that few genes are differentially expressed. Efficiency Analysis can be misleading if the tests exhibit a bias in any particular dimension (e.g. expression intensity); we therefore explored intensity-scaled and segmented J5 tests using data in which all genes are scaled to share the same intensity distribution range. Efficiency Analysis correctly predicted the ‘best’ test and normalization method using the Beer dataset and also performed well with the Bhattacharjee dataset based on both efficiency and classification accuracy criteria.
Integrative analysis for finding genes and networks involved in diabetes and other complex diseases
Regine Bergholdt, Zenia M St?rling, Kasper Lage, E Olof Karlberg, Páll í ólason, Mogens Aalund, J?rn Nerup, S?ren Brunak, Christopher T Workman, Flemming Pociot
Genome Biology , 2007, DOI: 10.1186/gb-2007-8-11-r253
Abstract: Complex traits like type 1 diabetes (T1D) are generally believed to be under the influence of multiple genes interacting with each other to confer disease susceptibility and/or protection. Identification of susceptibility genes in complex genetic diseases, however, poses many challenging problems. The contribution from single genes is often limited and genetic studies generally do not offer clues about the functional context of a gene associated with a complex disorder. A recent report demonstrated the feasibility of constructing functional human gene networks (using, for example, expression and Gene Ontology (GO) data [1]), and using these in prioritizing positional candidate genes from non-interacting susceptibility loci for various heritable disorders [2]. It was shown that the obvious candidate genes were not always involved, and that taking an unbiased approach in assessing candidate genes using functional networks may result in new, non-obvious hypotheses that are statistically significant.One of the strongest indications of functional association is the presence of a physical interaction between proteins [3] and several reports have shown that proteins involved in the same phenotype are likely to be part of the same functional module (that is, protein sub-network) [4-6]. With this in mind, it seems reasonable to expect that, in many cases, components contributing to the same complex diseases will be members of the same functional modules, especially if the disease is associated with multiple genetic loci that show statistical indication for epistasis. This indicates that in the case of complex disorders a feasible strategy would be to search for groups of interacting proteins that together lead to significant association with the disease in question. However, a strategy searching for loci showing genetic interaction (epistasis) integrated with a search for protein networks spanning the epistatic regions and subsequent significance ranking of these networks ha
Page 1 /100
Display every page Item

Copyright © 2008-2017 Open Access Library. All rights reserved.