oalib
Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
Cloud Computing for Comparative Genomics with Windows Azure Platform
Insik Kim, Jae-Yoon Jung, Todd F. DeLuca, Tristan H. Nelson and Dennis P. Wall
Evolutionary Bioinformatics , 2012, DOI: 10.4137/EBO.S9946
Abstract: Cloud computing services have emerged as a cost-effective alternative for cluster systems as the number of genomes and required computation power to analyze them increased in recent years. Here we introduce the Microsoft Azure platform with detailed execution steps and a cost comparison with Amazon Web Services.
IMGD: an integrated platform supporting comparative genomics and phylogenetics of insect mitochondrial genomes
Wonhoon Lee, Jongsun Park, Jaeyoung Choi, Kyongyong Jung, Bongsoo Park, Donghan Kim, Jaeyoung Lee, Kyohun Ahn, Wonho Song, Seogchan Kang, Yong-Hwan Lee, Seunghwan Lee
BMC Genomics , 2009, DOI: 10.1186/1471-2164-10-148
Abstract: The Insect Mitochondrial Genome Database (IMGD) is a new integrated platform that archives the mitochondrial genome sequences from 25,747 hexapod species, including 112 completely sequenced and 20 nearly completed genomes and 113,985 partially sequenced mitochondrial genomes. The Species-driven User Interface (SUI) of IMGD supports data retrieval and diverse analyses at multi-taxon levels. The Phyloviewer implemented in IMGD provides three methods for drawing phylogenetic trees and displays the resulting trees on the web. The SNP database incorporated to IMGD presents the distribution of SNPs and INDELs in the mitochondrial genomes of multiple isolates within eight species. A newly developed comparative SNU Genome Browser supports the graphical presentation and interactive interface for the identified SNPs/INDELs.The IMGD provides a solid foundation for the comparative mitochondrial genomics and phylogenetics of insects. All data and functions described here are available at the web site http://www.imgd.org/ webcite.The mitochondrial genomes of members of the superclass Hexapoda (generally referred to as the 'insects') are typically approximately 15 kilobases (kb) in length and encode 37 genes, including 13 protein coding genes (PCGs), 2 ribosomal RNA genes (rRNAs), and 22 transfer RNA genes (tRNAs). Owing to its small size, high copy number, and relatively infrequent gene rearrangements, the mitochondrial genome has been extensively used for phylogenetic analyses [1-4]. Phylogenetic analysis based on the mitochondrial gene sequences is often limited to closely related species, due to the high rate of nucleotide substitutions. However, variations in the mitochondrial gene content and order have been utilized to elucidate evolutionary relationships among distantly-related species, on the basis of shared derived characteristics that denote the common ancestry of a given group [5].Recent years, the number of sequenced mitochondrial genomes has been increasing fast due
Comparative genomics - A perspective  [cached]
Selvarajan Sivashankari,Piramanayagam Shanmughavel
Bioinformation , 2007,
Abstract: The rapidly emerging field of comparative genomics has yielded dramatic results. Comparative genome analysis has become feasible with the availability of a number of completely sequenced genomes. Comparison of complete genomes between organisms allow for global views on genome evolution and the availability of many completely sequenced genomes increases the predictive power in deciphering the hidden information in genome design, function and evolution. Thus, comparison of human genes with genes from other genomes in a genomic landscape could help assign novel functions for un-annotated genes. Here, we discuss the recently used techniques for comparative genomics and their derived inferences in genome biology.
Cloud computing for comparative genomics
Dennis P Wall, Parul Kudtarkar, Vincent A Fusaro, Rimma Pivovarov, Prasad Patil, Peter J Tonellato
BMC Bioinformatics , 2010, DOI: 10.1186/1471-2105-11-259
Abstract: We ran more than 300,000 RSD-cloud processes within the EC2. These jobs were farmed simultaneously to 100 high capacity compute nodes using the Amazon Web Service Elastic Map Reduce and included a wide mix of large and small genomes. The total computation time took just under 70 hours and cost a total of $6,302 USD.The effort to transform existing comparative genomics algorithms from local compute infrastructures is not trivial. However, the speed and flexibility of cloud computing environments provides a substantial boost with manageable cost. The procedure designed to transform the RSD algorithm into a cloud-ready application is readily adaptable to similar comparative genomics problems.The onslaught of new genome sequences has begun to outpace the local computing infrastructures used to calculate and store comparative genomic information. For example, because the number of genomes has increased approximately 12 fold over the last 5 years, algorithms that detect orthologs and assemble phylogenetic profiles are faced with an increasing computational demand.One such computationally intensive comparative genomics method, the reciprocal smallest distance algorithm (RSD), is particularly representative of the scaling problems faced by comparative genomics applications. RSD is a whole-genomic comparative tool designed to detect orthologous sequences between pairs of genomes. The algorithm [1] (Figure 1) employs BLAST [2] as a first step, starting with a subject genome, J, and a protein query sequence, i, belonging to genome I. A set of hits, H, exceeding a predefined significance threshold (e.g., E < 10-10, though this is adjustable) is obtained. Then, using clustalW [3], each protein sequence in H is aligned separately with the original query sequence i. If the alignable region of the two sequences exceeds a threshold fraction of the alignment's total length (e.g., 0.8, although this is also adjustable), the codeml program of PAML [4] is used to obtain a maximum likeli
Genomics Portals: integrative web-platform for mining genomics data
Kaustubh Shinde, Mukta Phatak, Freudenberg M Johannes, Jing Chen, Qian Li, Joshi K Vineet, Zhen Hu, Krishnendu Ghosh, Jaroslaw Meller, Mario Medvedovic
BMC Genomics , 2010, DOI: 10.1186/1471-2164-11-27
Abstract: Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc), and the integration with an extensive knowledge base that can be used in such analysis.The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org webcite.A large amount of experimental data generated by modern high-throughput technologies is available through public repositories such as GEO [1] and ArrayExpress [2]. Our knowledge about molecular interaction networks and functional biological pathways is rapidly expanding and is being systematically organized into functionally related gene lists [3,4]. Jointly these two sources of information hold a tremendous potential for enhancing the interpretation of experimental results and gaining new insights into function of living systems. Mining such data has been a productive avenue in generating new hypothesis as well as validating experimental results [5]. Unfortunately, repositories currently housing much of the primary genomics data lack mechanisms for effective querying and analysis.Inadequacies of the major data repositories to serve as access points to genomics data have resulted in numerous fragmented projects providing access to dat
Datasets for evolutionary comparative genomics
David A Liberles
Genome Biology , 2005, DOI: 10.1186/gb-2005-6-8-117
Abstract: Bioinformaticists and computational biologists working in the field of comparative genomics are largely dependent on datasets generated by others. Working with available data opens up desires for complementary datasets to fill knowledge gaps. In addition to writing grants for experimental laboratories and molecular biology supplies, one can also write an opinion piece to convince others to do some of the dirty work for you; this is what I am attempting to do here. Comparative genomics starts with sequencing. Many have suggested gaps in the tree of life, where additional genome projects will augment current knowledge, either to shorten long 'branches' on the tree of sequenced genomes or to complement existing genome projects. For example, there remain huge gaps in our knowledge of archaea. But with the faith that these gaps will ultimately be filled in, in this article I focus on alternative strategies for directing genomic resources so as to answer fundamental questions in evolution.A whole class of genomic experiments can be hypothesized through what can be called the 'tape of life' question. Stephen J. Gould wrote in his book Wonderful Life [1], "Wind back the tape of life to the early days of the Burgess shale; let it play again from an identical starting point, and the chance becomes vanishingly small that anything like human intelligence would grace the replay". At the molecular level, the tape of life has been played in parallel. Different species have gone from a similar ancestral point to a similar derived phenotype. In these cases, are the same molecules and pathways driving the phenotypic evolution? Comparative genomics gives us unprecedented opportunities to answer such questions.A few studies have tried to address the tape-of-life question through analysis of a single gene, such as the melanocortin-1 receptor (MC1R). This receptor plays a role in pigmentation and body/hair color, representing an obvious link between selectable genotype and phenotype. MC1
Cloud Computing for Comparative Genomics with Windows Azure Platform
Insik Kim,Jae-Yoon Jung,Todd F. DeLuca,Tristan H. Nelson
Evolutionary Bioinformatics , 2012,
Abstract:
Comparative genomics of Helicobacter pylori  [cached]
Quan-Jiang Dong, Qing Wang, Ying-Nin Xin, Ni Li, Shi-Ying Xuan
World Journal of Gastroenterology , 2009,
Abstract: Genomic sequences have been determined for a number of strains of Helicobacter pylori (H pylori) and related bacteria. With the development of microarray analysis and the wide use of subtractive hybridization techniques, comparative studies have been carried out with respect to the interstrain differences between H pylori and inter-species differences in the genome of related bacteria. It was found that the core genome of H pylori constitutes 1111 genes that are determinants of the species properties. A great pool of auxillary genes are mainly from the categories of cag pathogenicity islands, outer membrane proteins, restriction-modification system and hypothetical proteins of unknown function. Persistence of H pylori in the human stomach leads to the diversification of the genome. Comparative genomics suggest that a host jump has occurs from humans to felines. Candidate genes specific for the development of the gastric diseases were identified. With the aid of proteomics, population genetics and other molecular methods, future comparative genomic studies would dramatically promote our understanding of the evolution, pathogenesis and microbiology of H pylori.
GenoSets: Visual Analytic Methods for Comparative Genomics  [PDF]
Aurora A. Cain, Robert Kosara, Cynthia J. Gibas
PLOS ONE , 2012, DOI: 10.1371/journal.pone.0046401
Abstract: Many important questions in biology are, fundamentally, comparative, and this extends to our analysis of a growing number of sequenced genomes. Existing genomic analysis tools are often organized around literal views of genomes as linear strings. Even when information is highly condensed, these views grow cumbersome as larger numbers of genomes are added. Data aggregation and summarization methods from the field of visual analytics can provide abstracted comparative views, suitable for sifting large multi-genome datasets to identify critical similarities and differences. We introduce a software system for visual analysis of comparative genomics data. The system automates the process of data integration, and provides the analysis platform to identify and explore features of interest within these large datasets. GenoSets borrows techniques from business intelligence and visual analytics to provide a rich interface of interactive visualizations supported by a multi-dimensional data warehouse. In GenoSets, visual analytic approaches are used to enable querying based on orthology, functional assignment, and taxonomic or user-defined groupings of genomes. GenoSets links this information together with coordinated, interactive visualizations for both detailed and high-level categorical analysis of summarized data. GenoSets has been designed to simplify the exploration of multiple genome datasets and to facilitate reasoning about genomic comparisons. Case examples are included showing the use of this system in the analysis of 12 Brucella genomes. GenoSets software and the case study dataset are freely available at http://genosets.uncc.edu. We demonstrate that the integration of genomic data using a coordinated multiple view approach can simplify the exploration of large comparative genomic data sets, and facilitate reasoning about comparisons and features of interest.
Comparative genomics comes of age
Leonard Lipovich
Genome Biology , 2002, DOI: 10.1186/gb-2002-3-8-reports4024
Abstract: A publicly available draft sequence of the mouse genome at 6.3X coverage (each base sequenced an average of 6.3 times) - announced by Robert Waterston (Washington University, St Louis, USA) in the opening session of this meeting - can now be compared with the available human draft sequence. But what can, and can't, the mouse tell us about being human? Has mammalian comparative genomics advanced enough to enable us to understand why humans and chimpanzees look and behave so differently despite an estimated 98.8% genomic DNA sequence identity? And are mammalian genes more complex than they were thought to be in the heady early days of counting gene numbers, when only crude automated annotations and meager cDNA collections were available? Most of the material at the 2002 annual Cold Spring Harbor meeting that was not presented in some form in 2000 and 2001 was relevant to these three fundamental questions.Mike Kamal (Whitehead Institute and Massachusetts Institute of Technology, Cambridge, USA) was the first of many speakers to emphasize the surprisingly high extent of noncoding sequence conservation between human and mouse. Kamal revealed that only 50% of conserved elements in the total genomic sequence (exons and introns) of orthologous genes correspond to exons. So, what are the putative non-exonic conserved sequences? One possible answer was suggested in a poster presented by Emmanouil Dermitzakis (University of Geneva, Switzerland). As detailed by Dermitzakis, 62% of sequence blocks on human chromosome 21 that are conserved in the mouse are predicted to be non-exonic by existing annotations. But many of them correspond to expressed sequence tags and long open reading frames, and they therefore probably do in fact represent novel exons of known, and novel, genes. The utility of human-mouse comparisons is limited, however; in fact, of the 1,822 exons on human chromosome 21, only 68% have equivalents in the mouse (poster presented by Katsuhiko Murakami, RIKEN Genomic
Page 1 /100
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.