oalib
Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
Multi-locus stepwise regression: a haplotype-based algorithm for finding genetic associations applied to atopic dermatitis
Sven Knüppel, Jorge Esparza-Gordillo, Ingo Marenholz, Hermann-Georg Holzhütter, Anja Bauerfeind, Andreas Ruether, Stephan Weidinger, Young-Ae Lee, Klaus Rohde
BMC Medical Genetics , 2012, DOI: 10.1186/1471-2350-13-8
Abstract: Our proposed multi-locus stepwise regression starts with an evaluation of all pair-wise SNP combinations and then extends each SNP combination stepwise by one SNP from the region, carrying out haplotype regression in each step. The best associated haplotype patterns are kept for the next step and must be corrected for multiple testing at the end. These haplotypes should also be replicated in an independent data set. We applied the method to a region of 259 SNPs from the epidermal differentiation complex (EDC) on chromosome 1q21 of a German GWAS using a case control set (1,914 individuals) and to 268 families with at least two affected children as replication.A 4-SNP haplotype pattern with high statistical significance in the case control set (p = 4.13 × 10-7 after Bonferroni correction) could be identified which remained significant in the family set after Bonferroni correction (p = 0.0398). Further analysis revealed that this pattern reflects mainly the effect of the well-known FLG gene; however, a FLG-independent haplotype in case control set (OR = 1.71, 95% CI: 1.32-2.23, p = 5.6 × 10-5) and family set (OR = 1.68, 95% CI: 1.18-2.38, p = 2.19 × 10-3) could be found in addition.Our approach is a useful tool for finding allele combinations associated with diseases beyond single SNP analysis in chromosomal candidate regions.Single marker association analysis has been widely used to identify genetic risk factors involved in the genetics of complex diseases [1]. Previous studies have suggested that haplotypes, a collection of ordered markers along a chromosome, may be more appropriate as a unit for statistical analysis than single genetic markers [2,3]. As demonstrated by simulation studies, statistical approaches based on haplotypes can be a powerful method to characterize the genetic background of complex diseases [1,4-6]. However, since haplotypes are often not directly observable we have to use unphased genotypes to estimate haplotypes.The advent of the gene chip t
Genetic interaction motif finding by expectation maximization – a novel statistical model for inferring gene modules from synthetic lethality
Yan Qi, Ping Ye, Joel S Bader
BMC Bioinformatics , 2005, DOI: 10.1186/1471-2105-6-288
Abstract: We have developed Genetic Interaction Motif Finding (GIMF), an algorithm for unsupervised motif discovery from synthetic lethal interaction data. Interaction motifs are characterized by position weight matrices and optimized through expectation maximization. Given a seed gene, GIMF performs a nonlinear transform on the input genetic interaction data and automatically assigns genes to the motif or non-motif category. We demonstrate the capacity to extract known and novel pathways for Saccharomyces cerevisiae (budding yeast). Annotations suggested for several uncharacterized genes are supported by recent experimental evidence. GIMF is efficient in computation, requires no training and automatically down-weights promiscuous genes with high degrees.GIMF effectively identifies pathways from synthetic lethality data with several unique features. It is mostly suitable for building gene modules around seed genes. Optimal choice of one single model parameter allows construction of gene networks with different levels of confidence. The impact of hub genes the generic probabilistic framework of GIMF may be used to group other types of biological entities such as proteins based on stochastic motifs. Analysis of the strongest motifs discovered by the algorithm indicates that synthetic lethal interactions are depleted between genes within a motif, suggesting that synthetic lethality occurs between-pathway rather than within-pathway.Much recent research efforts have been devoted to studying gene functions in the context of highly dynamic and modular cellular networks [1-4]. Valuable information about a gene's function can be obtained from its interaction with other genes [5]. Apart from the traditional hierarchical way of gene function annotation, functional genomics takes a bottom-up approach to assemble gene interaction networks based on all pair-wise gene interactions detected. From such genetic interaction maps, Functional modules representing various biological pathways and p
Finding undetected protein associations in cell signaling by belief propagation  [PDF]
M. Bailly-Bechet,C. Borgs,A. Braunstein,J. Chayes,A. Dagkessamanskaia,J. -M. Fran?ois,R. Zecchina
Computer Science , 2011, DOI: 10.1073/pnas.1004751108
Abstract: External information propagates in the cell mainly through signaling cascades and transcriptional activation, allowing it to react to a wide spectrum of environmental changes. High throughput experiments identify numerous molecular components of such cascades that may, however, interact through unknown partners. Some of them may be detected using data coming from the integration of a protein-protein interaction network and mRNA expression profiles. This inference problem can be mapped onto the problem of finding appropriate optimal connected subgraphs of a network defined by these datasets. The optimization procedure turns out to be computationally intractable in general. Here we present a new distributed algorithm for this task, inspired from statistical physics, and apply this scheme to alpha factor and drug perturbations data in yeast. We identify the role of the COS8 protein, a member of a gene family of previously unknown function, and validate the results by genetic experiments. The algorithm we present is specially suited for very large datasets, can run in parallel, and can be adapted to other problems in systems biology. On renowned benchmarks it outperforms other algorithms in the field.
Statistical data mining for symbol associations in genomic databases  [PDF]
Bernard Ycart,Frédéric Pont,Jean-Jacques Fournié
Quantitative Biology , 2013,
Abstract: A methodology is proposed to automatically detect significant symbol associations in genomic databases. A new statistical test is proposed to assess the significance of a group of symbols when found in several genesets of a given database. Applied to symbol pairs, the thresholded p-values of the test define a graph structure on the set of symbols. The cliques of that graph are significant symbol associations, linked to a set of genesets where they can be found. The method can be applied to any database, and is illustrated MSigDB C2 database. Many of the symbol associations detected in C2 or in non-specific selections did correspond to already known interactions. On more specific selections of C2, many previously unkown symbol associations have been detected. These associations unveal new candidates for gene or protein interactions, needing further investigation for biological evidence.
Functional Relevance for Associations between Genetic Variants and Systemic Lupus Erythematosus  [PDF]
Fei-Yan Deng, Shu-Feng Lei, Yong-Hong Zhang, Zeng-Li Zhang, Yu-Fan Guo
PLOS ONE , 2013, DOI: 10.1371/journal.pone.0053037
Abstract: Systemic lupus erythematosus (SLE) is a serious prototype autoimmune disease characterized by chronic inflammation, auto-antibody production and multi-organ damage. Recent association studies have identified a long list of loci that were associated with SLE with relatively high statistical power. However, most of them only established the statistical associations of genetic markers and SLE at the DNA level without supporting evidence of functional relevance. Here, using publically available datasets, we performed integrative analyses (gene relationship across implicated loci analysis, differential gene expression analysis and functional annotation clustering analysis) and combined with expression quantitative trait loci (eQTLs) results to dissect functional mechanisms underlying the associations for SLE. We found that 14 SNPs, which were significantly associated with SLE in previous studies, have cis-regulation effects on four eQTL genes (HLA-DQA1, HLA-DQB1, HLA-DQB2, and IRF5) that were also differentially expressed in SLE-related cell groups. The functional evidence, taken together, suggested the functional mechanisms underlying the associations of 14 SNPs and SLE. The study may serve as an example of mining publically available datasets and results in validation of significant disease-association results. Utilization of public data resources for integrative analyses may provide novel insights into the molecular genetic mechanisms underlying human diseases.
Genetic Variation in FADS Genes and Plasma Cholesterol Levels in 2-Year-Old Infants: KOALA Birth Cohort Study  [PDF]
Carolina Moltó-Puigmartí, Eugène Jansen, Joachim Heinrich, Marie Standl, Ronald P. Mensink, Jogchum Plat, John Penders, Monique Mommers, Gerard H. Koppelman, Dirkje S. Postma, Carel Thijs
PLOS ONE , 2013, DOI: 10.1371/journal.pone.0061671
Abstract: Objective Single nucleotide polymorphisms (SNPs) in genes involved in fatty acid metabolism (FADS1 FADS2 gene cluster) are associated with plasma lipid levels. We aimed to investigate whether these associations are already present early in life and compare the relative contribution of FADS SNPs vs traditional (non-genetic) factors as determinants of plasma lipid levels. Methods Information on infants’ plasma total cholesterol levels, genotypes of five FADS SNPs (rs174545, rs174546, rs174556, rs174561, and rs3834458), anthropometric data, maternal characteristics, and breastfeeding history was available for 521 2-year-old children from the KOALA Birth Cohort Study. For 295 of these 521 children, plasma HDLc and non-HDLc levels were also known. Multivariable linear regression analysis was used to study the associations of genetic and non-genetic determinants with cholesterol levels. Results All FADS SNPs were significantly associated with total cholesterol levels. Heterozygous and homozygous for the minor allele children had about 4% and 8% lower total cholesterol levels than major allele homozygotes. In addition, homozygous for the minor allele children had about 7% lower HDLc levels. This difference reached significance for the SNPs rs174546 and rs3834458. The associations went in the same direction for non-HDLc, but statistical significance was not reached. The percentage of total variance of total cholesterol levels explained by FADS SNPs was relatively low (lower than 3%) but of the same order as that explained by gender and the non-genetic determinants together. Conclusions FADS SNPs are associated with plasma total cholesterol and HDLc levels in preschool children. This brings a new piece of evidence to explain how blood lipid levels may track from childhood to adulthood. Moreover, the finding that these SNPs explain a similar amount of variance in total cholesterol levels as the non-genetic determinants studied reveals the potential importance of investigating the effects of genetic variations in early life.
Decision Support Methods for Finding Phenotype — Disorder Associations in the Bone Dysplasia Domain  [PDF]
Razan Paul, Tudor Groza, Jane Hunter, Andreas Zankl
PLOS ONE , 2012, DOI: 10.1371/journal.pone.0050614
Abstract: A lack of mature domain knowledge and well established guidelines makes the medical diagnosis of skeletal dysplasias (a group of rare genetic disorders) a very complex process. Machine learning techniques can facilitate objective interpretation of medical observations for the purposes of decision support. However, building decision support models using such techniques is highly problematic in the context of rare genetic disorders, because it depends on access to mature domain knowledge. This paper describes an approach for developing a decision support model in medical domains that are underpinned by relatively sparse knowledge bases. We propose a solution that combines association rule mining with the Dempster-Shafer theory (DST) to compute probabilistic associations between sets of clinical features and disorders, which can then serve as support for medical decision making (e.g., diagnosis). We show, via experimental results, that our approach is able to provide meaningful outcomes even on small datasets with sparse distributions, in addition to outperforming other Machine Learning techniques and behaving slightly better than an initial diagnosis by a clinician.
Mate-Finding as an Overlooked Critical Determinant of Dispersal Variation in Sexually-Reproducing Animals  [PDF]
James J. Gilroy, Julie L. Lockwood
PLOS ONE , 2012, DOI: 10.1371/journal.pone.0038091
Abstract: Dispersal is a critically important process in ecology, but robust predictive models of animal dispersal remain elusive. We identify a potentially ubiquitous component of variation in animal dispersal that has been largely overlooked until now: the influence of mate encounters on settlement probability. We use an individual-based model to simulate dispersal in sexually-reproducing organisms that follow a simple set of movement rules based on conspecific encounters, within an environment lacking spatial habitat heterogeneity. We show that dispersal distances vary dramatically with fluctuations in population density in such a model, even in the absence of variation in dispersive traits between individuals. In a simple random-walk model with promiscuous mating, dispersal distributions become increasingly ‘fat-tailed’ at low population densities due to the increasing scarcity of mates. Similar variation arises in models incorporating territoriality. In a model with polygynous mating, we show that patterns of sex-biased dispersal can even be reversed across a gradient of population density, despite underlying dispersal mechanisms remaining unchanged. We show that some widespread dispersal patterns found in nature (e.g. fat tailed distributions) can arise as a result of demographic variability in the absence of heterogeneity in dispersive traits across the population. This implies that models in which individual dispersal distances are considered to be fixed traits might be unrealistic, as dispersal distances vary widely under a single dispersal mechanism when settlement is influenced by mate encounters. Mechanistic models offer a promising means of advancing our understanding of dispersal in sexually-reproducing organisms.
Finding Associations and Computing Similarity via Biased Pair Sampling  [PDF]
Andrea Campagna,Rasmus Pagh
Computer Science , 2009,
Abstract: This version is ***superseded*** by a full version that can be found at http://www.itu.dk/people/pagh/papers/mining-jour.pdf, which contains stronger theoretical results and fixes a mistake in the reporting of experiments. Abstract: Sampling-based methods have previously been proposed for the problem of finding interesting associations in data, even for low-support items. While these methods do not guarantee precise results, they can be vastly more efficient than approaches that rely on exact counting. However, for many similarity measures no such methods have been known. In this paper we show how a wide variety of measures can be supported by a simple biased sampling method. The method also extends to find high-confidence association rules. We demonstrate theoretically that our method is superior to exact methods when the threshold for "interesting similarity/confidence" is above the average pairwise similarity/confidence, and the average support is not too low. Our method is particularly good when transactions contain many items. We confirm in experiments on standard association mining benchmarks that this gives a significant speedup on real data sets (sometimes much larger than the theoretical guarantees). Reductions in computation time of over an order of magnitude, and significant savings in space, are observed.
The Impact of Phenotypic and Genetic Heterogeneity on Results of Genome Wide Association Studies of Complex Diseases  [PDF]
Mirko Manchia, Jeffrey Cullis, Gustavo Turecki, Guy A. Rouleau, Rudolf Uher, Martin Alda
PLOS ONE , 2013, DOI: 10.1371/journal.pone.0076295
Abstract: Phenotypic misclassification (between cases) has been shown to reduce the power to detect association in genetic studies. However, it is conceivable that complex traits are heterogeneous with respect to individual genetic susceptibility and disease pathophysiology, and that the effect of heterogeneity has a larger magnitude than the effect of phenotyping errors. Although an intuitively clear concept, the effect of heterogeneity on genetic studies of common diseases has received little attention. Here we investigate the impact of phenotypic and genetic heterogeneity on the statistical power of genome wide association studies (GWAS). We first performed a study of simulated genotypic and phenotypic data. Next, we analyzed the Wellcome Trust Case-Control Consortium (WTCCC) data for diabetes mellitus (DM) type 1 (T1D) and type 2 (T2D), using varying proportions of each type of diabetes in order to examine the impact of heterogeneity on the strength and statistical significance of association previously found in the WTCCC data. In both simulated and real data, heterogeneity (presence of “non-cases”) reduced the statistical power to detect genetic association and greatly decreased the estimates of risk attributed to genetic variation. This finding was also supported by the analysis of loci validated in subsequent large-scale meta-analyses. For example, heterogeneity of 50% increases the required sample size by approximately three times. These results suggest that accurate phenotype delineation may be more important for detecting true genetic associations than increase in sample size.
Page 1 /100
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.