oalib
Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
Techniques for integrating -omics data  [cached]
Siva Prasad Akula,Raghava Naidu Miriyala,Hanuman Thota,Allam Appa Rao
Bioinformation , 2009,
Abstract: The challenge for -omics research is to tackle the problem of fragmentation of knowledge by integrating several sources of heterogeneous information into a coherent entity. It is widely recognized that successful data integration is one of the keys to improve productivity for stored data. Through proper data integration tools and algorithms, researchers may correlate relationships that enable them to make better and faster decisions. The need for data integration is essential for present -omics community, because -omics data is currently spread world wide in wide variety of formats. These formats can be integrated and migrated across platforms through different techniques and one of the important techniques often used is XML. XML is used to provide a document markup language that is easier to learn, retrieve, store and transmit. It is semantically richer than HTML. Here, we describe bio warehousing, database federation, controlled vocabularies and highlighting the XML application to store, migrate and validate -omics data.
An eScience-Bayes strategy for analyzing omics data
Martin Eklund, Ola Spjuth, Jarl ES Wikberg
BMC Bioinformatics , 2010, DOI: 10.1186/1471-2105-11-282
Abstract: We present a general approach to omics data analysis that alleviates these problems. By combining eScience and Bayesian methods, we retrieve scientific information and data from multiple sources and coherently incorporate them into large models. These models improve the accuracy of predictions and offer new insights into the underlying mechanisms. This "eScience-Bayes" approach is demonstrated in two proof-of-principle applications, one for breast cancer prognosis prediction from transcriptomic data and one for protein-protein interaction studies based on proteomic data.Bayesian statistics provide the flexibility to tailor statistical models to the complex data structures in omics biology as well as permitting coherent integration of multiple data sources. However, Bayesian methods are in general computationally demanding and require specification of possibly thousands of prior distributions. eScience can help us overcome these difficulties. The eScience-Bayes thus approach permits us to fully leverage on the advantages of Bayesian methods, resulting in models with improved predictive performance that gives more information about the underlying biological system.High-throughput experimental methods, including DNA and protein microarrays and other omics techniques, have become ubiquitous, indispensable tools in biology and biomedicine. The number of high-throughput technologies is constantly increasing. They provide the power to measure thousands of features of a biological system in a single experiment, and they have the potential to revolutionize our understanding of biology and medicine. However, the high expectations for omics methods have fallen short of realization, due to the challenges the data present for statistical modeling. Thus, the wealth of data produced is difficult to translate into concrete biological knowledge, new drugs, and clinical practices [1,2]. A recurring problem is that few experimental samples are generated relative to the number of model
DATA ANNOTATION AND RELATIONS MODELING FOR INTEGRATED OMICS IN CLINICAL RESEARCH
Arno Lukas,Bernd Mayer
The IIOAB Journal , 2010,
Abstract: Omics has massively permeated translational clinical research with numerous diseases being covered by Omics studies from the genome to the metabolome level. Integrating these disease specific Omics tracks appears a logical next step for building the fundament of Systems Biology and Systems Medicine. Here, coherence of individual Omics tracks regarding clinical hypothesis, samples and clinical descriptors, and finally data handling and integration become pivotal. We present a data integration, annotation and relations modeling concept for heterogeneous Omics data and workflows. With molecular features at the center of all Omics we link the result profiles from different Omics tracks characterizing a specific disease phenotype to a common human molecular reference network for allowing a seamless integration and subsequent support in interpretation of Omics screening results. Our concept rests on data structures for representing objects specified by metadata and content. For handling diverse Omics tracks a flexible structure for content is proposed allowing data representation at different levels of granularity as demanded by the type of Omics and specific type of data. Content on the molecular level includes deep annotation of molecular features on gene and protein level. Based on this annotation pair-wise relations between molecular objects are built, traversing the molecular annotation into a network of relations (molecular feature graph). Such a relation network is also built on the Omics data level, combining explicit relations derived from study setup and implicit relations generated by mining metadata and content (Omics data graph). Finally both graphs are merged utilizing the molecular feature level as common denominator, enabling a persistent integration and subsequently interpretation of Omics profiling results in the realm of a given clinical hypothesis. We present a case study on integrating transcriptomics and proteomics data on chronic kidney disease for demonstrating the feasibility of this concept.
Sparse integrative clustering of multiple omics data sets  [PDF]
Ronglai Shen,Sijian Wang,Qianxing Mo
Statistics , 2013, DOI: 10.1214/12-AOAS578
Abstract: High resolution microarrays and second-generation sequencing platforms are powerful tools to investigate genome-wide alterations in DNA copy number, methylation and gene expression associated with a disease. An integrated genomic profiling approach measures multiple omics data types simultaneously in the same set of biological samples. Such approach renders an integrated data resolution that would not be available with any single data type. In this study, we use penalized latent variable regression methods for joint modeling of multiple omics data types to identify common latent variables that can be used to cluster patient samples into biologically and clinically relevant disease subtypes. We consider lasso [J. Roy. Statist. Soc. Ser. B 58 (1996) 267-288], elastic net [J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005) 301-320] and fused lasso [J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005) 91-108] methods to induce sparsity in the coefficient vectors, revealing important genomic features that have significant contributions to the latent variables. An iterative ridge regression is used to compute the sparse coefficient vectors. In model selection, a uniform design [Monographs on Statistics and Applied Probability (1994) Chapman & Hall] is used to seek "experimental" points that scattered uniformly across the search domain for efficient sampling of tuning parameter combinations. We compared our method to sparse singular value decomposition (SVD) and penalized Gaussian mixture model (GMM) using both real and simulated data sets. The proposed method is applied to integrate genomic, epigenomic and transcriptomic data for subtype analysis in breast and lung cancer data sets.
Visualization of three-way comparisons of omics data
Richard Baran, Martin Robert, Makoto Suematsu, Tomoyoshi Soga, Masaru Tomita
BMC Bioinformatics , 2007, DOI: 10.1186/1471-2105-8-72
Abstract: We propose a color-coding approach for the representation of three-way comparisons. The approach is based on the HSB (hue, saturation, brightness) color model. The three compared values are assigned specific hue values from the circular hue range (e.g. red, green, and blue). The hue value representing the three-way comparison is calculated according to the distribution of three compared values. If two of the values are identical and one is different, the resulting hue is set to the characteristic hue of the differing value. If all three compared values are different, the resulting hue is selected from a color gradient running between the hues of the two most distant values (as measured by the absolute value of their difference) according to the relative position of the third value between the two. The saturation of the color representing the three-way comparison reflects the amplitude (or extent) of the numerical difference between the two most distant values according to a scale of interest. The brightness is set to a maximum value by default but can be used to encode additional information about the three-way comparison.We propose a novel color-coding approach for intuitive visualization of three-way comparisons of omics data.Color-coded representations of differences between omics datasets provide an intuitive and global comparative view of the data [1]. Such visualizations further facilitate the use of human pattern recognition abilities to complement the automated approaches to pinpoint subtle differences [2]. Currently, most visualizations are limited to pairwise comparisons where differences of interest between two corresponding datapoints are mapped onto color gradients for positive or negative ranges. In addition, results of statistical tests (F ratio, z-score, quartile analysis, etc.) performed across multiple datasets can be visualized to highlight sets of corresponding datapoints containing a difference [3]. These results, however, do not provide informa
MODMatcher: Multi-Omics Data Matcher for Integrative Genomic Analysis  [PDF]
Seungyeul Yoo,Tao Huang,Joshua D. Campbell,Eunjee Lee,Zhidong Tu,Mark W. Geraci,Charles A. Powell,Eric E. Schadt,Avrum Spira,Jun Zhu
PLOS Computational Biology , 2014, DOI: doi/10.1371/journal.pcbi.1003790
Abstract: Errors in sample annotation or labeling often occur in large-scale genetic or genomic studies and are difficult to avoid completely during data generation and management. For integrative genomic studies, it is critical to identify and correct these errors. Different types of genetic and genomic data are inter-connected by cis-regulations. On that basis, we developed a computational approach, Multi-Omics Data Matcher (MODMatcher), to identify and correct sample labeling errors in multiple types of molecular data, which can be used in further integrative analysis. Our results indicate that inspection of sample annotation and labeling error is an indispensable data quality assurance step. Applied to a large lung genomic study, MODMatcher increased statistically significant genetic associations and genomic correlations by more than two-fold. In a simulation study, MODMatcher provided more robust results by using three types of omics data than two types of omics data. We further demonstrate that MODMatcher can be broadly applied to large genomic data sets containing multiple types of omics data, such as The Cancer Genome Atlas (TCGA) data sets.
Review on Periodicity Mining Techniques in Time Series Data
Yogesh Malode , Rahila Patel
International Journal of Advanced Computer Research , 2012,
Abstract: The rapid growth in data and databases increased a need of powerful data mining technique that will guide to analyze, forecast and predict behaviour of events. Periodicity mining needs to give more attention as its increased need in real life applications. In this paper, we are going to discuss on various periodicity mining techniques in Time Series Databases as well as symbolization. Here, we propose a periodicity mining technique that will detect various periodic patterns (symbol periodicity, sequence or partial periodicity, segment or full cycle periodicity) in time series databases using Fast Fourier Transform.
A New Omics Data Resource of Pleurocybella porrigens for Gene Discovery  [PDF]
Tomohiro Suzuki, Kaori Igarashi, Hideo Dohra, Takumi Someya, Tomoyuki Takano, Kiyonori Harada, Saori Omae, Hirofumi Hirai, Kentaro Yano, Hirokazu Kawagishi
PLOS ONE , 2013, DOI: 10.1371/journal.pone.0069681
Abstract: Background Pleurocybella porrigens is a mushroom-forming fungus, which has been consumed as a traditional food in Japan. In 2004, 55 people were poisoned by eating the mushroom and 17 people among them died of acute encephalopathy. Since then, the Japanese government has been alerting Japanese people to take precautions against eating the P. porrigens mushroom. Unfortunately, despite efforts, the molecular mechanism of the encephalopathy remains elusive. The genome and transcriptome sequence data of P. porrigens and the related species, however, are not stored in the public database. To gain the omics data in P. porrigens, we sequenced genome and transcriptome of its fruiting bodies and mycelia by next generation sequencing. Methodology/Principal Findings Short read sequences of genomic DNAs and mRNAs in P. porrigens were generated by Illumina Genome Analyzer. Genome short reads were de novo assembled into scaffolds using Velvet. Comparisons of genome signatures among Agaricales showed that P. porrigens has a unique genome signature. Transcriptome sequences were assembled into contigs (unigenes). Biological functions of unigenes were predicted by Gene Ontology and KEGG pathway analyses. The majority of unigenes would be novel genes without significant counterparts in the public omics databases. Conclusions Functional analyses of unigenes present the existence of numerous novel genes in the basidiomycetes division. The results mean that the omics information such as genome, transcriptome and metabolome in basidiomycetes is short in the current databases. The large-scale omics information on P. porrigens, provided from this research, will give a new data resource for gene discovery in basidiomycetes.
New resources for functional analysis of omics data for the genus Aspergillus
Benjamin M Nitsche, Jonathan Crabtree, Gustavo C Cerqueira, Vera Meyer, Arthur FJ Ram, Jennifer R Wortman
BMC Genomics , 2011, DOI: 10.1186/1471-2164-12-486
Abstract: Based on protein homology, we mapped 97% of the 3,498 GO annotated A. nidulans genes to at least one of seven other Aspergillus species: A. niger, A. fumigatus, A. flavus, A. clavatus, A. terreus, A. oryzae and Neosartorya fischeri. GO annotation files compatible with diverse publicly available tools have been generated and deposited online. To further improve their accessibility, we developed a web application for GO enrichment analysis named FetGOat and integrated GO annotations for all Aspergillus species with public genome sequences. Both the annotation files and the web application FetGOat are accessible via the Broad Institute's website (http://www.broadinstitute.org/fetgoat/index.html webcite). To demonstrate the value of those new resources for functional analysis of omics data for the genus Aspergillus, we performed two case studies analyzing microarray data recently published for A. nidulans, A. niger and A. oryzae.We mapped A. nidulans GO annotation to seven other Aspergilli. By depositing the newly mapped GO annotation online as well as integrating it into the web tool FetGOat, we provide new, valuable and easily accessible resources for omics data analysis and interpretation for the genus Aspergillus. Furthermore, we have given a general example of how a well annotated genome can help improving GO annotation of related species to subsequently facilitate the interpretation of omics data.Gene Ontology (GO) is a framework for functional annotation of gene products aiming to provide a unique vocabulary for living systems [1]. It comprises Biological Process (BP), Molecular Function (MF) and Cellular Component (CC) ontologies. GO terms are organized as directed acyclic graphs (DAG) meaning that GO terms are connected as nodes by directed edges defining hierarchical parent-child relationships. As a consequence, the specificity of GO terms increases with increasing distance from their root node. Enrichment analysis of GO terms is a well accepted approach to di
A Localized-Statistic-Based Approach for Biomarker Identification of Omics Data  [PDF]
Kuan Zhang, He Chen, Yongtao Li
Engineering (ENG) , 2013, DOI: 10.4236/eng.2013.510B089
Abstract:

Omics data provides an essential means for molecular biology and systems biology to capture the systematic properties of inner activities of cells. And one of the strongest challenge problems biological researchers have faced is to find the methods for discovering biomarkers for tracking the process of disease such as cancer. So some feature selection methods have been widely used to cope with discovering biomarkers problem. However omics data usually contains a large number of features, but a small number of samples and some omics data have a large range distribution, which make feature selection methods remains difficult to deal with omics data. In order to overcome the problems, wepresent a computing method called localized statistic of abundance distribution based on Gaussian window(LSADBGW) to test the significance of the feature. The experiments on three datasets including gene and protein datasets showed the accuracy and efficiency of LSADBGW for feature selection.

Page 1 /100
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.