oalib
Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
In silico discovery of gene-coding variants in murine quantitative trait loci using strain-specific genome sequence databases
Kriste E Marshall, Elizabeth L Godden, Fan Yang, Sonya Burgers, Kari J Buck, James M Sikela
Genome Biology , 2002, DOI: 10.1186/gb-2002-3-12-research0078
Abstract: Interstrain alignment of sequences derived from the relevant mouse strain genome sequence databases for 199 QTL-localized genes spanning 210,020 base-pairs of coding sequence identified 21 genes with different coding sequences for the progenitor strains. Several of these genes, including four that exhibit strong phenotypic links to chronic alcohol withdrawal, are promising candidates to underlie these QTLs.This approach has wide general utility, and should be applicable to any of the several hundred mouse QTLs, encompassing over 60 different complex traits, that have been identified using strains for which relatively complete genome sequences are available.The discovery of genes underlying multigenic diseases and traits is one of the most important challenges currently facing genetic researchers. This effort has been aided by quantitative trait locus (QTL) mapping methods, which have now been applied to numerous complex phenotypes in a range of species, including many behavioral phenotypes of high interest. A QTL is a chromosomal region that contains a gene or genes that influence a quantitative trait. The power of this approach was first demonstrated in plants [1] and later in yeast [2], flies [3], livestock [4,5], rodents [6,7,8,9] and humans [10,11,12].Historically, a typical approach to going from QTL to gene has been to select one or a few of the best biological candidate genes from within the QTL interval and search for sequence differences that predict differential expression and/or structure of the gene product. An alternative strategy is to carry out comparative sequencing of large numbers of potential candidate genes located within the QTL interval, which is feasible given the automated sequencing methods now available [13]. However, these approaches are limited because the gene underlying a QTL may not be recognized as a good candidate gene if little is known about the gene's function and/or if a QTL region is large, in which case sequencing every gene wi
Introduction to Genome Databases
基因组数据库简介 Introduction to Genome Databases

FANG Gang,CHEN Yun-Jia,GAO Ge,LIU Di,HE Kun,WU Xin,GU Xiao-Cheng,LUO Jing-Chu,
方刚
,陈蕴佳,高歌,刘翟,何坤,吴昕,顾孝诚,罗静初FANG Gang,CHEN Yun-Jia,GAO Ge,LIU Di,HE Kun,WU Xin,GU Xiao-Cheng,LUO Jing-Chu

遗传 , 2003,
Abstract: A brief introduction to the genome databases GDB,GenoList and Ensembl is given. These databases,mirrored and maintained at the Centre of Bioiniormatics,Peking University,provide useful information for genome research.
Matching curated genome databases: a non trivial task
Stéphane Descorps-Declère, Matthieu Barba, Bernard Labedan
BMC Genomics , 2008, DOI: 10.1186/1471-2164-9-501
Abstract: Here, we propose CorBank, a program devised to provide cross-referencing protein identifiers no matter what the level of identity is found between their matching sequences. Approximately 98% of the 1,983,258 amino acid sequences are matching, allowing instantaneous retrieval of their respective cross-references. CorBank further allows detecting any differences between the independently curated versions of the same genome. We found that the RefSeq and Genome Reviews versions are perfectly matching for only 50 of the 641 complete genomes we have analyzed. In all other cases there are differences occurring at the level of the coding sequence (CDS), and/or in the total number of CDS in the respective version of the same genome.CorBank is freely accessible at http://www.corbank.u-psud.fr webcite. The CorBank site contains also updated publication of the exhaustive results obtained by comparing RefSeq and Genome Reviews versions of each genome. Accordingly, this web site allows easy search of cross-references between RefSeq, Genome Reviews, and UniProt, for either a single CDS or a whole replicon.CorBank is very efficient in rapid detection of the numerous differences existing between RefSeq and Genome Reviews versions of the same curated genome. Although such differences are acceptable as reflecting different views, we suggest that curators of both genome databases could help reducing further divergence by agreeing on a minimal dialogue and attempting to publish the point of view of the other database whenever it is technically possible.Public genomic databanks are inexorably inundated by newly sequenced genomes. The number of complete sequence of prokaryotic genomes that are published per year has increased more than tenfold in the last seven years with a present rate close to four newly published prokaryotic genomes per week. One of the main challenges encountered by genome databanks is that complete genomic sequences are submitted with a heterogeneous and (too) often
Online genetic databases informing human genome epidemiology
Angela J Frodsham, Julian PT Higgins
BMC Medical Research Methodology , 2007, DOI: 10.1186/1471-2288-7-31
Abstract: We conducted a systematic search for online databases containing genetic epidemiological information on gene prevalence or gene-disease association. In those containing information on genetic association studies, we examined what additional information could be obtained to supplement a MEDLINE literature search.We identified 111 databases containing prevalence data, 67 databases specific to a single gene and only 13 that contained information on gene-disease associations. Most of the latter 13 databases were linked to MEDLINE, although five contained information that may not be available from other sources.There is no single resource of structured data from genetic association studies covering multiple diseases, and in relation to the number of studies being conducted there is very little information specific to gene-disease association studies currently available on the World Wide Web. Until comprehensive data repositories are created and utilized regularly, new data will remain largely inaccessible to many systematic review authors and meta-analysts.Following the human genome project [1] and with the increasing efficiency and throughput of genotyping techniques, very high numbers of genetic variants can be examined for predisposition to disease [2]. Vast untapped resources of genotyping data sit in laboratories across the world, unlikely to ever be published due to natural tendency to better disseminate the more striking of these findings [3]. As the world of genetics moves into the era of whole genome association studies, the amount of data generated will increase still further [2].Interpretation of the findings of genetic association studies is problematic, not only due to the selective reporting of findings, but also due to limitations of design, conduct, sample size, suboptimal analysis, and inconsistent findings across studies [4,5]. Systematic reviews and meta-analyses offer valuable means of assembling and synthesising the totality of evidence. They offer m
Difference-Huffman Coding of Multidimensional Databases  [PDF]
István Szépkúti
Computer Science , 2011,
Abstract: A new compression method called difference-Huffman coding (DHC) is introduced in this paper. It is verified empirically that DHC results in a smaller multidimensional physical representation than those for other previously published techniques (single count header compression, logical position compression, base-offset compression and difference sequence compression). The article examines how caching influences the expected retrieval time of the multidimensional and table representations of relations. A model is proposed for this, which is then verified with empirical data. Conclusions are drawn, based on the model and the experiment, about when one physical representation outperforms another in terms of retrieval time. Over the tested range of available memory, the performance for the multidimensional representation was always much quicker than for the table representation.
IdentiCS – Identification of coding sequence and in silico reconstruction of the metabolic network directly from unannotated low-coverage bacterial genome sequence
Jibin Sun, An-Ping Zeng
BMC Bioinformatics , 2004, DOI: 10.1186/1471-2105-5-112
Abstract: In this work a fast method is proposed to use unannotated genome sequence for predicting CDSs and for an in silico reconstruction of metabolic networks. Instead of using predicted genes or CDSs to query public databases, entries from public DNA or protein databases are used as queries to search a local database of the unannotated genome sequence to predict CDSs. Functions are assigned to the predicted CDSs simultaneously. The well-annotated genome of Salmonella typhimurium LT2 is used as an example to demonstrate the applicability of the method. 97.7% of the CDSs in the original annotation are correctly identified. The use of SWISS-PROT-TrEMBL databases resulted in an identification of 98.9% of CDSs that have EC-numbers in the published annotation. Furthermore, two versions of sequences of the bacterium Klebsiella pneumoniae with different genome coverage (3.9 and 7.9 fold, respectively) are examined. The results suggest that a 3.9-fold coverage of the bacterial genome could be sufficiently used for the in silico reconstruction of the metabolic network. Compared to other gene finding methods such as CRITICA our method is more suitable for exploiting sequences of low genome coverage. Based on the new method, a program called IdentiCS (Identification of Coding Sequences from Unfinished Genome Sequences) is delivered that combines the identification of CDSs with the reconstruction, comparison and visualization of metabolic networks (free to download at http://genome.gbf.de/bioinformatics/index.html webcite).The reversed querying process and the program IdentiCS allow a fast and adequate prediction protein coding sequences and reconstruction of the potential metabolic network from low coverage genome sequences of bacteria. The new method can accelerate the use of genomic data for studying cellular metabolism.Knowledge about the metabolic network of an organism is essential for understanding its physiology and phenotypic behavior. A comprehensive understanding of the met
Education for cataloging and knowledge organization  [PDF]
Alenka ?auperl
Knji?nica : Revija za Podro?je Bibliotekarstva in Informacijske Znanosti , 2005,
Abstract: Just as education for cataloging has to follow functions of the catalog, education for knowledge organization needs to stem from the functions of databases. The reform of higher education in the light of Bologna Declaration (1999) has stimulated a consideration of changes in education for cataloging and knowledge organization at the Department of Library and Information Science and Book Studies at the Faculty of Arts, University of Ljubljana. A two level university program (3+2 years) is suggested by the Bologna Declaration. A proposed stepwise learning for cataloging and knowledge organization is intended to consider the functions of a library catalog on the one hand and to enable students for various degrees of specialization on the other. In each year of study one or more courses would be offered that would increase theoretical and scientific level of topics according to the knowledge acquired in previous courses. However, theoretical knowledge should be enhanced by practical work , therefore courses should provide practical training and internship. After graduation a period of introducing novices should remain an essential part of training. Professional librarians should see continuing educaiton as an integral part of their careers.
Law of Genome Evolution Direction : Coding Information Quantity Grows  [PDF]
Liaofu Luo
Quantitative Biology , 2008, DOI: 10.1007/s11467-009-0014-x
Abstract: The problem of the directionality of genome evolution is studied. Based on the analysis of C-value paradox and the evolution of genome size we propose that the function-coding information quantity of a genome always grows in the course of evolution through sequence duplication, expansion of code, and gene transfer from outside. The function-coding information quantity of a genome consists of two parts, p-coding information quantity which encodes functional protein and n-coding information quantity which encodes other functional elements except amino acid sequence. The evidences on the evolutionary law about the function-coding information quantity are listed. The needs of function is the motive force for the expansion of coding information quantity and the information quantity expansion is the way to make functional innovation and extension for a species. So, the increase of coding information quantity of a genome is a measure of the acquired new function and it determines the directionality of genome evolution.
Databases and resources for human small non-coding RNAs
Eneritz Agirre, Eduardo Eyras
Human Genomics , 2011, DOI: 10.1186/1479-7364-5-3-192
Abstract: In 2001, three groups published independent reports on the discovery of a new class of small non-coding RNAs (sRNAs), which were named micro-RNAs (miRNAs) [1-3]. These comprise a large family of small, ~22 nucleotide-long, non-coding RNAs that have emerged as key players in post-transcriptional gene regulation [4]. Subsequent years have witnessed the discovery of many new types of sRNAs. In humans, apart from the hundreds of miRNAs detected so far, there are also many endogenous small interfering RNAs (endo-siRNAs)[5] and piwi-interacting RNAs (piRNAs)[6,7]. These and other short non-coding RNA molecules collectively are called 'sRNAs'. They are generally short (~18-30 nucleotides [nt]); do not code for proteins; exert their function as RNA molecules generally combined with protein factors; and represent a substantial portion of the RNA output of cells. Moreover, sRNAs encompass a diverse, widespread and basal regulatory system: they are known to regulate genes and genomes at different levels, including chromatin structure, transcription, RNA stability and translation [8-10]. Furthermore, they can act as activators or inhibitors and their disruption has been linked to disease [11]. The explosion of information on sRNAs makes necessary its organisation--in terms of their biogenesis, expression properties and functional characteristics--into public databases.Traditionally, GenBank,[12] the European Molecular Biology Laboratory (EMBL)[13] and the DNA Data Bank of Japan (DDBJ)[14] have been the depository of RNA sequences, while the Gene Expression Omnibus (GEO)[15] database at the National Center for Biotechnology Information (NCBI) compiles high-throughput data for miRNAs and other sRNAs from publications. Besides these generic resources, there are specialised databases for sRNAs. The most complete ones are those related to miRNAs, since their functional role in RNA metabolism is also the best characterised [5]. The miRBase database [16] (Table 1) is considered the ce
Specialized microbial databases for inductive exploration of microbial genome sequences
Gang Fang, Christine Ho, Yaowu Qiu, Virginie Cubas, Zhou Yu, Cédric Cabau, Frankie Cheung, Ivan Moszer, Antoine Danchin
BMC Genomics , 2005, DOI: 10.1186/1471-2164-6-14
Abstract: The GenoList package for collecting and mining microbial genome databases has been rewritten using MySQL as the database management system. Functions that were not available in MySQL, such as nested subquery, have been implemented.Inductive reasoning in the study of genomes starts from "islands of knowledge", centered around genes with some known background. With this concept of "neighborhood" in mind, a modified version of the GenoList structure has been used for organizing sequence data from prokaryotic genomes of particular interest in China. GenoChore http://bioinfo.hku.hk/genochore.html webcite, a set of 17 specialized end-user-oriented microbial databases (including one instance of Microsporidia, Encephalitozoon cuniculi, a member of Eukarya) has been made publicly available. These databases allow the user to browse genome sequence and annotation data using standard queries. In addition they provide a weekly update of searches against the world-wide protein sequences data libraries, allowing one to monitor annotation updates on genes of interest. Finally, they allow users to search for patterns in DNA or protein sequences, taking into account a clustering of genes into formal operons, as well as providing extra facilities to query sequences using predefined sequence patterns.This growing set of specialized microbial databases organize data created by the first Chinese bacterial genome programs (ThermaList, Thermoanaerobacter tencongensis, LeptoList, with two different genomes of Leptospira interrogans and SepiList, Staphylococcus epidermidis) associated to related organisms for comparison.We are facing a deluge of genome sequences. As of January 14th, 2005, the GOLD site identified 1248 completed or ongoing genome programs http://www.genomesonline.org webcite, and this certainly reflects only a partial view of the existing programs. While this shows that we implicitely possess an enormous wealth of information about the functions carried out by genes and genom
Page 1 /100
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.