oalib
Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
On Integrating Information Visualization Techniques into Data Mining: A Review  [PDF]
Keqian Li
Computer Science , 2015,
Abstract: The exploding growth of digital data in the information era and its immeasurable potential value has called for different types of data-driven techniques to exploit its value for further applications. Information visualization and data mining are two research field with such goal. While the two communities advocates different approaches of problem solving, the vision of combining the sophisticated algorithmic techniques from data mining as well as the intuitivity and interactivity of information visualization is tempting. In this paper, we attempt to survey recent researches and real world systems integrating the wisdom in two fields towards more effective and efficient data analytics. More specifically, we study the intersection from a data mining point of view, explore how information visualization can be used to complement and improve different stages of data mining through established theories for optimized visual presentation as well as practical toolsets for rapid development. We organize the survey by identifying three main stages of typical process of data mining, the preliminary analysis of data, the model construction, as well as the model evaluation, and study how each stage can benefit from information visualization.
Integrating plant 'omics'
David Secko
Genome Biology , 2004, DOI: 10.1186/gb-spotlight-20040616-01
Abstract: "Understanding... network behavior through a combination of analytical and mathematical approaches has great potential for deepening our understanding of metabolic regulation," said Alisdair Fernie, from the Max Planck Institute, Golm, Germany, who was not involved the study.After 10 years of studying sulfur metabolism at the individual gene and enzyme levels, Kazuki Saito, from Chiba University in Japan and senior author of the new study, said that he realized this approach was not sufficient to understand global network responses in plants. "In the post-genome era, drawing a holistic picture of cellular process is absolutely necessary for the understanding of gene-to-metabolite networks," Saito said.Therefore, Saito, lead author Masami Hirai, and their colleagues decided to integrate gene expression analysis (transcriptomics) and non-targeted metabolite profiling (metabolomics) to better understand sulfur and nitrogen metabolism in Arabidopsis.Using plants grown under sulfur- and/or nitrogen-deficient conditions, the authors determined the expression profiles of about 13,000 expression sequence tags with cDNA arrays. On the same samples, they also determined metabolic profiles of about 3,000 mass peaks with Fourier transform-ion cyclotron mass spectrometry. Mathematical integration of this data, using principal component analysis and batch-learning self-organizing map analysis, was then used to discern gene-to-metabolite networks potentially involved in sulfur and nitrogen metabolism."We could show the presence of general responses to sulfur and nitrogen deficiencies in the transcriptome and metabolome. In addition, a specific pathway, glucosinolate metabolism, among all metabolic pathways, is coordinately modulated by nutritional stresses of sulfur and nitrogen," Saito told us."Saito's group has done an exceptional job in integrating data and this publication represents one of the first of its kind," said Lloyd Sumner, from the Samuel Roberts Noble Foundation, Ar
DATA ANNOTATION AND RELATIONS MODELING FOR INTEGRATED OMICS IN CLINICAL RESEARCH
Arno Lukas,Bernd Mayer
The IIOAB Journal , 2010,
Abstract: Omics has massively permeated translational clinical research with numerous diseases being covered by Omics studies from the genome to the metabolome level. Integrating these disease specific Omics tracks appears a logical next step for building the fundament of Systems Biology and Systems Medicine. Here, coherence of individual Omics tracks regarding clinical hypothesis, samples and clinical descriptors, and finally data handling and integration become pivotal. We present a data integration, annotation and relations modeling concept for heterogeneous Omics data and workflows. With molecular features at the center of all Omics we link the result profiles from different Omics tracks characterizing a specific disease phenotype to a common human molecular reference network for allowing a seamless integration and subsequent support in interpretation of Omics screening results. Our concept rests on data structures for representing objects specified by metadata and content. For handling diverse Omics tracks a flexible structure for content is proposed allowing data representation at different levels of granularity as demanded by the type of Omics and specific type of data. Content on the molecular level includes deep annotation of molecular features on gene and protein level. Based on this annotation pair-wise relations between molecular objects are built, traversing the molecular annotation into a network of relations (molecular feature graph). Such a relation network is also built on the Omics data level, combining explicit relations derived from study setup and implicit relations generated by mining metadata and content (Omics data graph). Finally both graphs are merged utilizing the molecular feature level as common denominator, enabling a persistent integration and subsequently interpretation of Omics profiling results in the realm of a given clinical hypothesis. We present a case study on integrating transcriptomics and proteomics data on chronic kidney disease for demonstrating the feasibility of this concept.
An eScience-Bayes strategy for analyzing omics data
Martin Eklund, Ola Spjuth, Jarl ES Wikberg
BMC Bioinformatics , 2010, DOI: 10.1186/1471-2105-11-282
Abstract: We present a general approach to omics data analysis that alleviates these problems. By combining eScience and Bayesian methods, we retrieve scientific information and data from multiple sources and coherently incorporate them into large models. These models improve the accuracy of predictions and offer new insights into the underlying mechanisms. This "eScience-Bayes" approach is demonstrated in two proof-of-principle applications, one for breast cancer prognosis prediction from transcriptomic data and one for protein-protein interaction studies based on proteomic data.Bayesian statistics provide the flexibility to tailor statistical models to the complex data structures in omics biology as well as permitting coherent integration of multiple data sources. However, Bayesian methods are in general computationally demanding and require specification of possibly thousands of prior distributions. eScience can help us overcome these difficulties. The eScience-Bayes thus approach permits us to fully leverage on the advantages of Bayesian methods, resulting in models with improved predictive performance that gives more information about the underlying biological system.High-throughput experimental methods, including DNA and protein microarrays and other omics techniques, have become ubiquitous, indispensable tools in biology and biomedicine. The number of high-throughput technologies is constantly increasing. They provide the power to measure thousands of features of a biological system in a single experiment, and they have the potential to revolutionize our understanding of biology and medicine. However, the high expectations for omics methods have fallen short of realization, due to the challenges the data present for statistical modeling. Thus, the wealth of data produced is difficult to translate into concrete biological knowledge, new drugs, and clinical practices [1,2]. A recurring problem is that few experimental samples are generated relative to the number of model
Merging Data Mining Techniques for Web Page Access Prediction: Integrating Markov Model with Clustering  [PDF]
Triloknath Pandey,Ranjita Kumari Dash,Alkananda Tripathy,Barnali Sahu
International Journal of Computer Science Issues , 2012,
Abstract: Web page access prediction gained its importance from the ever increasing number of e-commerce Web information systems and e-businesses. Web page prediction, that involves personalizing the Web users browsing experiences, assists Web masters in the improvement of the Website structure and helps Web users in navigating the site and accessing the information they need. The most widely used approach for this purpose is the pattern discovery process of Web usage mining that entails many techniques like Markov model, association rules and clustering. Implementing pattern discovery techniques as such helps predict the next page to be accessed by the Web user based on the users previous browsing patterns. However, each of the aforementioned techniques has its own limitations, especially when it comes to accuracy and space complexity. This paper achieves better accuracy as well as less state space complexity and rules generated by performing the following combinations. We integrate low -order Markov model and clustering. The data sets are clustered and Markov model analysis is performed on each cluster instead of the whole data sets. The outcome of the integration is better accuracy than the combination with less state space complexity than higher order Markov model.
Integrating Remote Sensing Data with Directional Two- Dimensional Wavelet Analysis and Open Geospatial Techniques for Efficient Disaster Monitoring and Management  [PDF]
Yun-Bin Lin,Yu-Pin Lin,Dong-Po Deng,Kuan-Wei Chen
Sensors , 2008, DOI: 10.3390/s8021070
Abstract: In Taiwan, earthquakes have long been recognized as a major cause oflandslides that are wide spread by floods brought by typhoons followed. Distinguishingbetween landslide spatial patterns in different disturbance regimes is fundamental fordisaster monitoring, management, and land-cover restoration. To circumscribe landslides,this study adopts the normalized difference vegetation index (NDVI), which can bedetermined by simply applying mathematical operations of near-infrared and visible-redspectral data immediately after remotely sensed data is acquired. In real-time disastermonitoring, the NDVI is more effective than using land-cover classifications generatedfrom remotely sensed data as land-cover classification tasks are extremely time consuming.Directional two-dimensional (2D) wavelet analysis has an advantage over traditionalspectrum analysis in that it determines localized variations along a specific direction whenidentifying dominant modes of change, and where those modes are located in multi-temporal remotely sensed images. Open geospatial techniques comprise a series ofsolutions developed based on Open Geospatial Consortium specifications that can beapplied to encode data for interoperability and develop an open geospatial service for sharing data. This study presents a novel approach and framework that uses directional 2Dwavelet analysis of real-time NDVI images to effectively identify landslide patterns andshare resulting patterns via open geospatial techniques. As a case study, this study analyzedNDVI images derived from SPOT HRV images before and after the ChiChi earthquake(7.3 on the Richter scale) that hit the Chenyulan basin in Taiwan, as well as images aftertwo large typhoons (Xangsane and Toraji) to delineate the spatial patterns of landslidescaused by major disturbances. Disturbed spatial patterns of landslides that followed theseevents were successfully delineated using 2D wavelet analysis, and results of patternrecognitions of landslides were distributed simultaneously to other agents using geographymarkup language. Real-time information allows successive platforms (agents) to work withlocal geospatial data for disaster management. Furthermore, the proposed is suitable fordetecting landslides in various regions on continental, regional, and local scales usingremotely sensed data in various resolutions derived from SPOT HRV, IKONOS, andQuickBird multispectral images.
Integrating Remote Sensing Data with Directional Two- Dimensional Wavelet Analysis and Open Geospatial Techniques for Efficient Disaster Monitoring and Management
Yun-Bin Lin,Yu-Pin Lin,Dong-Po Deng,Kuan-Wei Chen
Sensors , 2008,
Abstract: In Taiwan, earthquakes have long been recognized as a major cause oflandslides that are wide spread by floods brought by typhoons followed. Distinguishingbetween landslide spatial patterns in different disturbance regimes is fundamental fordisaster monitoring, management, and land-cover restoration. To circumscribe landslides,this study adopts the normalized difference vegetation index (NDVI), which can bedetermined by simply applying mathematical operations of near-infrared and visible-redspectral data immediately after remotely sensed data is acquired. In real-time disastermonitoring, the NDVI is more effective than using land-cover classifications generatedfrom remotely sensed data as land-cover classification tasks are extremely time consuming.Directional two-dimensional (2D) wavelet analysis has an advantage over traditionalspectrum analysis in that it determines localized variations along a specific direction whenidentifying dominant modes of change, and where those modes are located in multi-temporal remotely sensed images. Open geospatial techniques comprise a series ofsolutions developed based on Open Geospatial Consortium specifications that can beapplied to encode data for interoperability and develop an open geospatial service for sharing data. This study presents a novel approach and framework that uses directional 2Dwavelet analysis of real-time NDVI images to effectively identify landslide patterns andshare resulting patterns via open geospatial techniques. As a case study, this study analyzedNDVI images derived from SPOT HRV images before and after the ChiChi earthquake(7.3 on the Richter scale) that hit the Chenyulan basin in Taiwan, as well as images aftertwo large typhoons (Xangsane and Toraji) to delineate the spatial patterns of landslidescaused by major disturbances. Disturbed spatial patterns of landslides that followed theseevents were successfully delineated using 2D wavelet analysis, and results of patternrecognitions of landslides were distributed simultaneously to other agents using geographymarkup language. Real-time information allows successive platforms (agents) to work withlocal geospatial data for disaster management. Furthermore, the proposed is suitable fordetecting landslides in various regions on continental, regional, and local scales usingremotely sensed data in various resolutions derived from SPOT HRV, IKONOS, andQuickBird multispectral images.
Quantifying periodicity in omics data  [PDF]
Masaru Tomita,Douglas B. Murray
Frontiers in Cell and Developmental Biology , 2014, DOI: 10.3389/fcell.2014.00040
Abstract: Oscillations play a significant role in biological systems, with many examples in the fast, ultradian, circadian, circalunar, and yearly time domains. However, determining periodicity in such data can be problematic. There are a number of computational methods to identify the periodic components in large datasets, such as signal-to-noise based Fourier decomposition, Fisher's g-test and autocorrelation. However, the available methods assume a sinusoidal model and do not attempt to quantify the waveform shape and the presence of multiple periodicities, which provide vital clues in determining the underlying dynamics. Here, we developed a Fourier based measure that generates a de-noised waveform from multiple significant frequencies. This waveform is then correlated with the raw data from the respiratory oscillation found in yeast, to provide oscillation statistics including waveform metrics and multi-periods. The method is compared and contrasted to commonly used statistics. Moreover, we show the utility of the program in the analysis of noisy datasets and other high-throughput analyses, such as metabolomics and flow cytometry, respectively.
New resources for functional analysis of omics data for the genus Aspergillus
Benjamin M Nitsche, Jonathan Crabtree, Gustavo C Cerqueira, Vera Meyer, Arthur FJ Ram, Jennifer R Wortman
BMC Genomics , 2011, DOI: 10.1186/1471-2164-12-486
Abstract: Based on protein homology, we mapped 97% of the 3,498 GO annotated A. nidulans genes to at least one of seven other Aspergillus species: A. niger, A. fumigatus, A. flavus, A. clavatus, A. terreus, A. oryzae and Neosartorya fischeri. GO annotation files compatible with diverse publicly available tools have been generated and deposited online. To further improve their accessibility, we developed a web application for GO enrichment analysis named FetGOat and integrated GO annotations for all Aspergillus species with public genome sequences. Both the annotation files and the web application FetGOat are accessible via the Broad Institute's website (http://www.broadinstitute.org/fetgoat/index.html webcite). To demonstrate the value of those new resources for functional analysis of omics data for the genus Aspergillus, we performed two case studies analyzing microarray data recently published for A. nidulans, A. niger and A. oryzae.We mapped A. nidulans GO annotation to seven other Aspergilli. By depositing the newly mapped GO annotation online as well as integrating it into the web tool FetGOat, we provide new, valuable and easily accessible resources for omics data analysis and interpretation for the genus Aspergillus. Furthermore, we have given a general example of how a well annotated genome can help improving GO annotation of related species to subsequently facilitate the interpretation of omics data.Gene Ontology (GO) is a framework for functional annotation of gene products aiming to provide a unique vocabulary for living systems [1]. It comprises Biological Process (BP), Molecular Function (MF) and Cellular Component (CC) ontologies. GO terms are organized as directed acyclic graphs (DAG) meaning that GO terms are connected as nodes by directed edges defining hierarchical parent-child relationships. As a consequence, the specificity of GO terms increases with increasing distance from their root node. Enrichment analysis of GO terms is a well accepted approach to di
Integration Analysis of Three Omics Data Using Penalized Regression Methods: An Application to Bladder Cancer  [PDF]
Silvia Pineda?,Francisco X. Real?,Manolis Kogevinas?,Alfredo Carrato?,Stephen J. Chanock?,Núria Malats?,Kristel Van Steen
PLOS Genetics , 2015, DOI: 10.1371/journal.pgen.1005689
Abstract: Omics data integration is becoming necessary to investigate the genomic mechanisms involved in complex diseases. During the integration process, many challenges arise such as data heterogeneity, the smaller number of individuals in comparison to the number of parameters, multicollinearity, and interpretation and validation of results due to their complexity and lack of knowledge about biological processes. To overcome some of these issues, innovative statistical approaches are being developed. In this work, we propose a permutation-based method to concomitantly assess significance and correct by multiple testing with the MaxT algorithm. This was applied with penalized regression methods (LASSO and ENET) when exploring relationships between common genetic variants, DNA methylation and gene expression measured in bladder tumor samples. The overall analysis flow consisted of three steps: (1) SNPs/CpGs were selected per each gene probe within 1Mb window upstream and downstream the gene; (2) LASSO and ENET were applied to assess the association between each expression probe and the selected SNPs/CpGs in three multivariable models (SNP, CPG, and Global models, the latter integrating SNPs and CPGs); and (3) the significance of each model was assessed using the permutation-based MaxT method. We identified 48 genes whose expression levels were significantly associated with both SNPs and CPGs. Importantly, 36 (75%) of them were replicated in an independent data set (TCGA) and the performance of the proposed method was checked with a simulation study. We further support our results with a biological interpretation based on an enrichment analysis. The approach we propose allows reducing computational time and is flexible and easy to implement when analyzing several types of omics data. Our results highlight the importance of integrating omics data by applying appropriate statistical strategies to discover new insights into the complex genetic mechanisms involved in disease conditions.
Page 1 /100
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.