oalib

Publish in OALib Journal

ISSN: 2333-9721

APC: Only $99

Submit

Any time

2015 ( 5 )

2014 ( 12 )

2013 ( 19 )

2012 ( 16 )

Custom range...

Search Results: 1 - 10 of 179 matches for " Tero Aittokallio "
All listed articles are free for downloading (OA Articles)
Page 1 /179
Display every page Item
Probe-level estimation improves the detection of differential splicing in Affymetrix exon array studies
Essi Laajala, Tero Aittokallio, Riitta Lahesmaa, Laura L Elo
Genome Biology , 2009, DOI: 10.1186/gb-2009-10-7-r77
Abstract: Alternative splicing is the process in which multiple mRNA isoforms are generated from a single gene by selectively joining together exons of a primary transcript in different patterns (see, for example, [1] for a review). Thus, instead of coding a single protein, the same genetic locus may produce a variety of different proteins with different properties and distinct functions in the system. Alternative splicing is emerging as a key mechanism for enabling the vast proteomic diversity of higher organisms from a relatively low number of genes. While genome sequencing projects have revealed that the number of protein-coding genes in an organism does not correlate with its overall cellular complexity (for example, mammalian species have similar numbers of genes to Arabidopsis thaliana), alternative splicing has turned out to be more the rule than the exception [2,3]. For instance, genome-wide studies have suggested that up to 92 to 94% of human genes undergo alternative splicing [4]. Tissue-specific gene isoforms are known to play a critical role in the development and proper function of diverse cell types, and disruptions of normal splicing patterns changing the isoform structure have been implicated in various cancer types and other human diseases [5,6]. In particular, a number of genetic point mutations associated with human hereditary diseases have been linked to disrupted splicing [6]. Hence, a comprehensive understanding of disease development requires detailed knowledge of the roles of alternatively spliced genes and their products.The early genome-wide attempts to detect alternative splicing were mainly based on sequence databases of expressed sequence tags and cDNA [3]. A major drawback of these approaches is that they are highly constrained by the available expressed sequence tag/cDNA sequences, with typically inadequate transcript coverage and only a limited number of cell or tissue sources [3]. Towards the genome-wide identification of functionally relevant
Wrapper-based selection of genetic features in genome-wide association studies through fast matrix operations
Tapio Pahikkala, Sebastian Okser, Antti Airola, Tapio Salakoski, Tero Aittokallio
Algorithms for Molecular Biology , 2012, DOI: 10.1186/1748-7188-7-11
Abstract: We have implemented an algorithm, known as greedy RLS, that we use to perform the first known wrapper-based feature selection on the genome-wide level. The running time of greedy RLS grows linearly in the number of training examples, the number of features in the original data set, and the number of selected features. This speed is achieved through computational short-cuts based on matrix calculus. Since the memory consumption in present-day computers can form an even tighter bottleneck than running time, we also developed a space efficient variation of greedy RLS which trades running time for memory. These approaches are then compared to traditional wrapper-based feature selection implementations based on support vector machines (SVM) to reveal the relative speed-up and to assess the feasibility of the new algorithm. As a proof of concept, we apply greedy RLS to the Hypertension - UK National Blood Service WTCCC dataset and select the most predictive variants using 3-fold external cross-validation in less than 26 minutes on a high end desktop. On this dataset, we also show that greedy RLS has a better classification performance on independent test data than a classifier trained using features selected by a statistical p-value-based filter, which is currently the most popular approach for constructing predictive models in GWAS.Greedy RLS is the first known implementation of a machine learning based method with the capability to conduct a wrapper-based feature selection on an entire GWAS containing several thousand examples and over 400,000 variants. In our experiments, greedy RLS selected a highly predictive subset of genetic variants in a fraction of the time spent by wrapper-based selection methods used together with SVM classifiers. The proposed algorithms are freely available as part of the RLScore software library at http://users.utu.fi/aatapa/RLScore/.
Missing value imputation improves clustering and interpretation of gene expression microarray data
Johannes Tuikkala, Laura L Elo, Olli S Nevalainen, Tero Aittokallio
BMC Bioinformatics , 2008, DOI: 10.1186/1471-2105-9-202
Abstract: We show that this discrepancy can mostly be attributed to the way in which imputation methods have traditionally been developed and evaluated. By comparing a number of advanced imputation methods on recent microarray datasets, we show that even when there are marked differences in the measurement-level imputation accuracies across the datasets, these differences become negligible when the methods are evaluated in terms of how well they can reproduce the original gene clusters or their biological interpretations. Regardless of the evaluation approach, however, imputation always gave better results than ignoring missing data points or replacing them with zeros or average values, emphasizing the continued importance of using more advanced imputation methods.The results demonstrate that, while missing values are still severely complicating microarray data analysis, their impact on the discovery of biologically meaningful gene groups can – up to a certain degree – be reduced by using readily available and relatively fast imputation methods, such as the Bayesian Principal Components Algorithm (BPCA).During the past decade, microarray technology has become a major tool in functional genomics and biomedical research. It has been successfully used, for example, in genome-wide gene expression profiling [1], tumor classification [2], and construction of gene regulatory networks [3]. Gene expression data analysts currently have a wide range of computational tools available to them. Cluster analysis is typically one of the first exploratory tools used on a new gene expression microarray dataset [4]. It allows researchers to find natural groups of genes without any a priori information, providing computational predictions and hypotheses about functional roles of unknown genes for subsequent experimental testing. Clusters of genes are often given a biological interpretation using the Gene Ontology (GO) annotations which are significantly enriched for the genes in a given cluster.
A statistical score for assessing the quality of multiple sequence alignments
Virpi Ahola, Tero Aittokallio, Mauno Vihinen, Esa Uusipaikka
BMC Bioinformatics , 2006, DOI: 10.1186/1471-2105-7-484
Abstract: To address the need for an objective evaluation framework, we introduce a statistical score that assesses the quality of a given multiple sequence alignment. The quality assessment is based on counting the number of significantly conserved positions in the alignment using importance sampling method in conjunction with statistical profile analysis framework. We first evaluate a novel objective function used in the alignment quality score for measuring the positional conservation. The results for the Src homology 2 (SH2) domain, Ras-like proteins, peptidase M13, subtilase and β-lactamase families demonstrate that the score can distinguish sequence patterns with different degrees of conservation. Secondly, we evaluate the quality of the alignments produced by several widely used multiple sequence alignment programs using a novel alignment quality score and a commonly used sum of pairs method. According to these results, the Mafft strategy L-INS-i outperforms the other methods, although the difference between the Probcons, TCoffee and Muscle is mostly insignificant. The novel alignment quality score provides similar results than the sum of pairs method.The results indicate that the proposed statistical score is useful in assessing the quality of multiple sequence alignments.A wealth of molecular data concerning the linear structure of proteins and nucleic acids is available in the form of DNA, RNA and protein sequences. Multiple sequence alignment has become an essential and widely used tool for understanding the structure and function of these molecules. The results of annotation of gene/protein sequences, prediction of protein structures or building of phylogenetic trees, for instance, are critically dependent on the quality of the given alignment. It has been recognized that the automatic construction of a multiple sequence alignment for a set of remotely related sequences can be a very demanding task. Therefore, there is a need for an objective approach to evaluate
Quantitative maps of genetic interactions in yeast - Comparative evaluation and integrative analysis
Rolf O Lindén, Ville-Pekka Eronen, Tero Aittokallio
BMC Systems Biology , 2011, DOI: 10.1186/1752-0509-5-45
Abstract: Using large-scale data matrices from epistatic miniarray profiling (E-MAP), genetic interaction mapping (GIM), and synthetic genetic array (SGA) approaches, we carried out here a systematic comparative evaluation among these quantitative maps of genetic interactions in yeast. The relatively low association between the original interaction measurements or their customized scores could be improved using a matrix-based modelling framework, which enables the use of single- and double-mutant fitness estimates and measurements, respectively, when scoring genetic interactions. Toward an integrative analysis, we show how the detections from the different screening approaches can be combined to suggest novel positive and negative interactions which are complementary to those obtained using any single screening approach alone. The matrix approximation procedure has been made available to support the design and analysis of the future screening studies.We have shown here that even if the correlation between the currently available quantitative genetic interaction maps in yeast is relatively low, their comparability can be improved by means of our computational matrix approximation procedure, which will enable integrative analysis and detection of a wider spectrum of genetic interactions using data from the complementary screening approaches.The recent advances in experimental biotechnologies have made it possible to start screening genome-wide datasets of quantitative genetic interactions in model organisms such as yeast [1-3]. High-throughput genetic screening approaches, such as those based on epistatic miniarray profiling (E-MAP) [4-7], genetic interaction mapping (GIM) [8], and synthetic genetic array (SGA) [9-11], have provided systematic means to global investigation of quantitative relationship between genotype and phenotype, with potential implications for a wide range of biological phenomena, including, for instance, modularity, essentiality, redundancy, buffering, epi
Medroxyprogesterone improves nocturnal breathing in postmenopausal women with chronic obstructive pulmonary disease
Tarja Saaresranta, Tero Aittokallio, Karri Utriainen, Olli Polo
Respiratory Research , 2005, DOI: 10.1186/1465-9921-6-28
Abstract: A single-blind placebo-controlled trial was performed in 15 postmenopausal women with moderate to severe COPD. A 12-week trial included 2-week treatment periods with placebo and MPA (60 mg/d/14 days). All patients underwent a polysomnography with monitoring of SaO2 and transcutaneous PCO2 (tcCO2) at baseline, with placebo, with medroxyprogesterone acetate (MPA 60 mg/d/14 days), and three and six weeks after cessation of MPA.Thirteen patients completed the trial. At baseline, the average ± SD of SaO2 mean was 90.6 ± 3.2 % and the median of SaO2 nadir 84.8 % (interquartile range, IQR 6.1). MPA improved them by 1.7 ± 1.6 %-units (95 % confidence interval (CI) 0.56, 2.8) and by 3.9 %-units (IQR 4.9; 95% CI 0.24, 10.2), respectively. The average of tcCO2 median was 6.0 ± 0.9 kPa and decreased with MPA by 0.9 ± 0.5 kPa (95% CI -1.3, -0.54). MPA improved SaO2 nadir and tcCO2 median also during REM sleep. Three weeks after cessation of MPA, the SaO2 mean remained 1.4 ± 1.8 %-units higher than at baseline, the difference being not significant (95% CI -0.03, 2.8). SaO2 nadir was 2.7 %-units (IQR 4.9; 95% CI 0.06, 18.7) higher than at baseline. Increases in SaO2 mean and SaO2 nadir during sleep with MPA were inversely associated with baseline SaO2 mean (r = -0.70, p = 0.032) and baseline SaO2 nadir (r = -0.77, p = 0.008), respectively. Treatment response in SaO2 mean, SaO2 nadir and tcCO2 levels did not associate with pack-years smoked, age, BMI, spirometric results or sleep variables.MPA-induced respiratory improvement in postmenopausal women seems to be consistent and prolonged. The improvement was greater in patients with lower baseline SaO2 values. Long-term studies in females are warranted.Chronic obstructive pulmonary disease (COPD), consisting of variable degrees of pulmonary emphysema and chronic obstructive bronchitis, has a male predominance. However, the prevalence of COPD is steadily increasing among women [1] as a consequence of increased rates of cigarette smokin
RPA: Probabilistic analysis of probe performance and robust summarization
Leo Lahti,Laura L. Elo,Tero Aittokallio,Samuel Kaski
Computer Science , 2011,
Abstract: Probe-level models have led to improved performance in microarray studies but the various sources of probe-level contamination are still poorly understood. Data-driven analysis of probe performance can be used to quantify the uncertainty in individual probes and to highlight the relative contribution of different noise sources. Improved understanding of the probe-level effects can lead to improved preprocessing techniques and microarray design. We have implemented probabilistic tools for probe performance analysis and summarization on short oligonucleotide arrays. In contrast to standard preprocessing approaches, the methods provide quantitative estimates of probe-specific noise and affinity terms and tools to investigate these parameters. Tools to incorporate prior information of the probes in the analysis are provided as well. Comparisons to known probe-level error sources and spike-in data sets validate the approach. Implementation is freely available in R/BioConductor: http://www.bioconductor.org/packages/release/bioc/html/RPA.html
Predicting Quantitative Genetic Interactions by Means of Sequential Matrix Approximation
Aki P. J?rvinen, Jukka Hiissa, Laura L. Elo, Tero Aittokallio
PLOS ONE , 2008, DOI: 10.1371/journal.pone.0003284
Abstract: Despite the emerging experimental techniques for perturbing multiple genes and measuring their quantitative phenotypic effects, genetic interactions have remained extremely difficult to predict on a large scale. Using a recent high-resolution screen of genetic interactions in yeast as a case study, we investigated whether the extraction of pertinent information encoded in the quantitative phenotypic measurements could be improved by computational means. By taking advantage of the observation that most gene pairs in the genetic interaction screens have no significant interactions with each other, we developed a sequential approximation procedure which ranks the mutation pairs in order of evidence for a genetic interaction. The sequential approximations can efficiently remove background variation in the double-mutation screens and give increasingly accurate estimates of the single-mutant fitness measurements. Interestingly, these estimates not only provide predictions for genetic interactions which are consistent with those obtained using the measured fitness, but they can even significantly improve the accuracy with which one can distinguish functionally-related gene pairs from the non-interacting pairs. The computational approach, in general, enables an efficient exploration and classification of genetic interactions in other studies and systems as well.
Genome-Wide Scoring of Positive and Negative Epistasis through Decomposition of Quantitative Genetic Interaction Fitness Matrices
Ville-Pekka Eronen,Rolf O. Lindén,Anna Lindroos,Mirella Kanerva,Tero Aittokallio
PLOS ONE , 2012, DOI: 10.1371/journal.pone.0011611
Abstract: Recent technological developments in genetic screening approaches have offered the means to start exploring quantitative genotype-phenotype relationships on a large-scale. What remains unclear is the extent to which the quantitative genetic interaction datasets can distinguish the broad spectrum of interaction classes, as compared to existing information on mutation pairs associated with both positive and negative interactions, and whether the scoring of varying degrees of such epistatic effects could be improved by computational means. To address these questions, we introduce here a computational approach for improving the quantitative discrimination power encoded in the genetic interaction screening data. Our matrix approximation model decomposes the original double-mutant fitness matrix into separate components, representing variability across the array and query mutants, which can be utilized for estimating and correcting the single-mutant fitness effects, respectively. When applied to three large-scale quantitative interaction datasets in yeast, we could improve the accuracy of scoring various interaction classes beyond that obtained with the original fitness data, especially in synthetic genetic array (SGA) and in genetic interaction mapping (GIM) datasets. In addition to the known pairs of interactions used in the evaluation of the computational approach, a number of novel interaction pairs were also predicted, along with underlying biological mechanisms, which remained undetected by the original datasets. It was shown that the optimal choice of the scoring function depends heavily on the screening approach and on the interaction class under analysis. Moreover, a simple preprocessing of the fitness matrix could further enhance the discrimination power of the epistatic miniarray profiling (E-MAP) dataset. These systematic evaluation results provide in-depth information on the optimal analysis of the future, large-scale screening experiments. In general, the modeling framework, enabling accurate identification and classification of genetic interactions, provides a solid basis for completing and mining the genetic interaction networks in yeast and other organisms.
A multilevel layout algorithm for visualizing physical and genetic interaction networks, with emphasis on their modular organization
Johannes Tuikkala, Heidi V?h?maa, Pekka Salmela, Olli S Nevalainen, Tero Aittokallio
BioData Mining , 2012, DOI: 10.1186/1756-0381-5-2
Abstract: We implemented a modified layout plug-in, named Multilevel Layout, which applies the conventional layout algorithms within a multilevel optimization framework to better capture the hierarchical modularity of many biological networks. Using a wide variety of real life biological networks, we carried out a systematic evaluation of the method in comparison with other layout algorithms in Cytoscape.The multilevel approach provided both biologically relevant and visually pleasant layout solutions in most network types, hence complementing the layout options available in Cytoscape. In particular, it could improve drawing of large-scale networks of yeast genetic interactions and human physical interactions. In more general terms, the biological evaluation framework developed here enables one to assess the layout solutions from any existing or future graph drawing algorithm as well as to optimize their performance for a given network type or structure.By making use of the multilevel modular organization when visualizing biological networks, together with the biological evaluation of the layout solutions, one can generate convenient visualizations for many network biology applications.Network graphs provide a valuable conceptual framework for representing and mining high-throughput experimental datasets, as well as for extracting and interpreting their biological information by the means of graph-based analysis approaches [1-8]. In cellular systems, network nodes typically refer to biomolecules, such as genes or proteins, and the edge connections the type of relationships the network is encoding, including physical or functional information. Network visualization aims to organize the complex network structures in a way that provides the user with readily apparent insights into the most interesting biological patterns and relationships within the data, such as components of biological pathways, processes or complexes, which can be further investigated by follow-up computation
Page 1 /179
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.