Publish in OALib Journal

ISSN: 2333-9721

APC: Only $99


Any time

2019 ( 169 )

2018 ( 304 )

2017 ( 293 )

2016 ( 309 )

Custom range...

Search Results: 1 - 10 of 156871 matches for " Jason H. Moore "
All listed articles are free for downloading (OA Articles)
Page 1 /156871
Display every page Item
Mining beyond the exome
Davnah Urbach, Jason H Moore
BioData Mining , 2011, DOI: 10.1186/1756-0381-4-14
Abstract: The premise of GWAS is the "common disease-common variant" hypothesis, which posits that common diseases are, at least partly, associated with DNA sequence variations or polymorphisms present in more than 1-5% of the population. It turns out that most allele frequencies battle to reach the 5% detection threshold of commercial genotyping arrays and the "common disease-rare variant" hypothesis is gradually taking precedence over its counterpart [2]. Hence, aiming for the rare variants using whole genome sequencing for example is one first step into the right direction [3]. A further step is to deliberately include synonymous polymorphisms among the genetic variants considered in association studies. Although largely disregarded, synonymous polymorphisms are about twice as numerous as non-synonymous ones [4] and are often found responsible for altered protein structure, function and expression level [5]. Accordingly, a considerable list of disease-associated synonymous polymorphisms is already available [5] and there are more to be found. Besides single nucleotide polymorphisms (SNPs), variation can also be structural: multi-kilobase genomic regions can be inserted or deleted (copy number variation, CNV), or they can be moved (copy neutral variation), within (inversion) or between (translocation) chromosomes [6,7]. Structural variants have already been shown to contribute to disease phenotypes [8,9], but with the help of high resolution GWAS purposely designed to detect them, there are undoubtedly more discoveries ahead [6,7].Variants can adopt different forms but they can also occur in different locations throughout the genome. When given the choice between (quasi) random SNPs and SNPs located in coding regions (gene-centric approach), choosing the latter is the safer bet [10]. However, the fact that more than 80% of the risk-associated variants identified so far fall outside of the coding regions suggests that there is a third option, namely the non-coding regions of
The spatial dimension in biological data mining
Davnah Urbach, Jason H Moore
BioData Mining , 2011, DOI: 10.1186/1756-0381-4-6
Abstract: Among its numerous applications, data mining plays an increasingly important role in epidemiology. In particular, it allows processing the steadily increasing volume of genomic data and helps identifying genetic risk factors. Despite ongoing progress, the mining methods currently manufactured for exploring such data still stumble over their very characteristic features and in particular their considerable complexity and diversity. Genomic data range from DNA sequences and single nucleotide polymorphisms (SNPs) to gene and protein expression levels and protein-protein interaction patterns, and further encompass structural and functional genome annotation. Accordingly, various types of data are generally treated independently and patterns emerging from any set of analyses are stitched together to form a biological answer or to generate new hypotheses.Occasionally, such patterns are projected onto a geographical map, superimposed to migration patterns or correlated to environmental factors, placing crude numeric information into a spatio-temporal perspective [reviewed in [1]]. Integrating spatial, environmental and genetic data into models of geographic disease etiology (ecogeographic genetic epidemiology) has recently been proposed as a new interdisciplinary pathway to understand the distribution and the determinants of diseases [1]. The Geographic Information Systems (GIS) used to integrate these multiple layers of information is a set of powerful hardware and software for inputting, managing, displaying and analyzing geographically referenced information. GIS have relatively recently been recognized as a useful tool for biomedical research, and in particular for visualizing cancer distributions and estimating the contribution of various environmental risk factors to cancer prevalence [reviewed in [1]]. Accordingly, the American National Cancer Institute http://gis.cancer.gov/ webcite, with the Long Island Breast Cancer Study Project for instance http://li-gis.cancer
Mining the diseasome
Davnah Urbach, Jason H Moore
BioData Mining , 2011, DOI: 10.1186/1756-0381-4-25
Abstract: Diseases are traditionally considered as discrete entities and classified accordingly. However, the networks of genes accountable for particular disease phenotypes most certainly overlap, with individual genes simultaneously serving the cause of multiple disorders [5,7]. Clinically distinct diseases have genes in common, like nodes in a network have links in common, and DNs capture this analogy by representing diseases with nodes and the genes they share with links. In such a network representation, breast cancer and pancreatic cancer for instance are two nodes connected by TP53 [5]. What the concept of DN implies is that many susceptibility loci hitherto associated to distinct diseases are in fact likely to contribute to the genetic architecture of several disorders. Hence, rather than initiating genetic association studies with no a priori hypothesis about where in the genome to look for potential candidate risk loci, the information captured by HDNs may serve the purpose of anchoring the search for susceptibility loci in genomic regions known to harbor genetic variants predictive of other "linked" diseases. Subsequently, the human interactome [6], i.e., the compendium of molecular, phenotypic and genetic interactions, or genome-wide regulatory networks [8] can serve as maps to navigate the genome in search of further susceptibility loci.Additional indices on where to start exploring the genome for susceptibility loci can be inferred from general principles of human diseases and clinical data. For example, a considerable fraction of diseases with onset early in life appear to result from defects in enzyme-encoding genes, whereas diseases with onset during adulthood appear to be caused by alterations in genes encoding modifiers of protein functions [9]. Thus, clinical information such as age at onset or severity can serve as valuable expert knowledge to narrow down the genomic search space to genes or genetic domains that are biologically and clinically meaningful.
Data mining and the evolution of biological complexity
Davnah Urbach, Jason H Moore
BioData Mining , 2011, DOI: 10.1186/1756-0381-4-7
Abstract: Canalization is broadly defined as the evolution of phenotypic robustness to genetic or environmental perturbations [see for e.g. [1-3]]. Canalization buffers developmental pathways against the tendency for both new allelic variants and environmentally-induced noise to generate suboptimal phenotypes, and thereby ensures the reliability of vital mechanisms such as cognition, glucose metabolism or immune response [4].Canalization implies a reduction in trait variability [1,3], i.e. in the propensity to vary in response to mutations or environmental changes [5], whereas it leaves genetic variability unaffected, allowing for cryptic genetic variation to accumulate [6]. By repressing the expression of existing genetic variation and of novel mutations, canalization reduces the responsiveness of traits to natural selection [5], and hence their potential to evolve [1,3]. However, if selection for canalization weakens, the building-up of hidden genetic variation likely increases the potential for evolutionary divergence [1,4].Several molecular mechanisms contribute to canalization [see for e.g. [3]], including redundancy [7] and regulatory genetic interactions [5,8,9]. In the present context, redundancy refers to the compensation for the loss of a gene's activity by one or several alternative genes derived from the same ancestor through gene duplication [7]. The notion of canalization through genetic interactions refers to both the modification of existing interactions and the incorporation of new ones as additional genes enter existing genetic networks [9]. Hence in the former case, robustness is achieved because diverse genetic modules - ranging from single haploid genes to complex genetic networks - can produce virtually identical phenotypes, whereas in the latter, it is achieved by ensuring the robust structure of the genetic networks underlying phenotypes.Canalization provides a theoretical basis for understanding how evolution shapes the genotype-phenotype relationship
Chapter 11: Genome-Wide Association Studies
William S. Bush ,Jason H. Moore
PLOS Computational Biology , 2012, DOI: 10.1371/journal.pcbi.1002822
Abstract: Genome-wide association studies (GWAS) have evolved over the last ten years into a powerful tool for investigating the genetic architecture of human disease. In this work, we review the key concepts underlying GWAS, including the architecture of common diseases, the structure of common human genetic variation, technologies for capturing genetic information, study designs, and the statistical methods used for data analysis. We also look forward to the future beyond GWAS.
Principal component gene set enrichment (PCGSE)
H. Robert Frost,Zhigang Li,Jason H. Moore
Quantitative Biology , 2014, DOI: 10.1186/s13040-015-0059-z
Abstract: Motivation: Although principal component analysis (PCA) is widely used for the dimensional reduction of biomedical data, interpretation of PCA results remains daunting. Most existing methods attempt to explain each principal component (PC) in terms of a small number of variables by generating approximate PCs with few non-zero loadings. Although useful when just a few variables dominate the population PCs, these methods are often inadequate for characterizing the PCs of high-dimensional genomic data. For genomic data, reproducible and biologically meaningful PC interpretation requires methods based on the combined signal of functionally related sets of genes. While gene set testing methods have been widely used in supervised settings to quantify the association of groups of genes with clinical outcomes, these methods have seen only limited application for testing the enrichment of gene sets relative to sample PCs. Results: We describe a novel approach, principal component gene set enrichment (PCGSE), for computing the statistical association between gene sets and the PCs of genomic data. The PCGSE method performs a two-stage competitive gene set test using the correlation between each gene and each PC as the gene-level test statistic with flexible choice of both the gene set test statistic and the method used to compute the null distribution of the gene set statistic. Using simulated data with simulated gene sets and real gene expression data with curated gene sets, we demonstrate that biologically meaningful and computationally efficient results can be obtained from a simple parametric version of the PCGSE method that performs a correlation-adjusted two-sample t-test between the gene-level test statistics for gene set members and genes not in the set. Availability: http://cran.r-project.org/web/packages/PCGSE/index.html Contact: rob.frost@dartmouth.edu or jason.h.moore@dartmouth.edu
Spectral gene set enrichment (SGSE)
H. Robert Frost,Zhigang Li,Jason H. Moore
Quantitative Biology , 2014, DOI: 10.1186/s12859-015-0490-7
Abstract: Motivation: Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. Results: We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes the statistical association between gene sets and principal components (PCs) using our principal component gene set enrichment (PCGSE) method. The overall statistical association between each gene set and the spectral structure of the data is then computed by combining the PC-level p-values using the weighted Z-method with weights set to the PC variance scaled by Tracey-Widom test p-values. Using simulated data, we show that the SGSE algorithm can accurately recover spectral features from noisy data. To illustrate the utility of our method on real data, we demonstrate the superior performance of the SGSE method relative to standard cluster-based techniques for testing the association between MSigDB gene sets and the variance structure of microarray gene expression data. Availability: http://cran.r-project.org/web/packages/PCGSE/index.html Contact: rob.frost@dartmouth.edu or jason.h.moore@dartmouth.edu
Gene expression signatures for autoimmune disease in peripheral blood mononuclear cells
Nancy J Olsen, Jason H Moore, Thomas M Aune
Arthritis Research & Therapy , 2004, DOI: 10.1186/ar1190
Abstract: The relatively new technology of DNA microarrays has made it feasible to measure the expression levels of thousands of genes in small biological samples [1]. It has been suggested that this methodology might be especially useful in analyzing the complex and parallel changes that occur within cells and tissues of the immune system in normal and pathologic states [2]. Much of the early work using DNA microarrays was in the field of oncology; other studies have examined host responses to infectious agents or drugs [3]. The gene array approach is especially well-suited to the type of multifactorial analysis that is needed to unravel the causes of human autoimmune disorders that involve both complex genetics and environmental factors [4,5]. Studies in autoimmune disease have included the use of biopsy samples from affected patients, targeting tissues such as synovium, brain or skin [6-9]. While this approach can offer insights for some disease subsets, it does not permit study of all afflicted patients and cannot be applied to early phases of disease when therapeutic interventions are most likely to be useful. As an alternative, we and others have hypothesized that due to the systemic nature of autoimmune disease, clinically relevant changes in gene expression should be observed in peripheral blood mononuclear cells (PBMCs). Using peripheral blood as the source of gene expression material offers the possibility of sampling any individual at any time and also has the potential to detect early pathogenetic and prognostic factors. This review will examine studies in autoimmune disease, focusing on the utility of peripheral blood samples to identify genes of interest. The potential for this approach to provide insights into disease pathogenesis and to aid with diagnosis and management are also discussed.A relatively small number of microarray studies in autoimmunity have been reported [3]. Some of these have used animal models, such as for alopecia areata [7] and experimenta
Evolving hard problems: Generating human genetics datasets with a complex etiology
Daniel S Himmelstein, Casey S Greene, Jason H Moore
BioData Mining , 2011, DOI: 10.1186/1756-0381-4-21
Abstract: Here we develop and evaluate a model free evolution strategy to generate datasets which display a complex relationship between individual genotype and disease susceptibility. We show that this model free approach is capable of generating a diverse array of datasets with distinct gene-disease relationships for an arbitrary interaction order and sample size. We specifically generate eight-hundred Pareto fronts; one for each independent run of our algorithm. In each run the predictiveness of single genetic variation and pairs of genetic variants have been minimized, while the predictiveness of third, fourth, or fifth-order combinations is maximized. Two hundred runs of the algorithm are further dedicated to creating datasets with predictive four or five order interactions and minimized lower-level effects.This method and the resulting datasets will allow the capabilities of novel methods to be tested without pre-specified genetic models. This allows researchers to evaluate which methods will succeed on human genetics problems where the model is not known in advance. We further make freely available to the community the entire Pareto-optimal front of datasets from each run so that novel methods may be rigorously evaluated. These 76,600 datasets are available from http://discovery.dartmouth.edu/model_free_data/ webcite.Advances in genotyping technologies are changing the way geneticists measure genetic variation. It is now technologically feasible to measure more than one million variations from across the human genome. Here we focus on SNPs, or single nucleotide polymorphisms. A SNP is a single point in a DNA sequence that differs between individuals. A major goal in human genetics is to link the state of these SNPs to disease risk. The standard approach to this problem is to measure the genotypes of people with and without a disease of interest across hundreds of thousands to millions of SNPs. Each of these SNPs is then tested individually for an association with the d
Exploratory Visual Analysis of Statistical Results from Microarray Experiments Comparing High and Low Grade Glioma
David M. Reif,Mark A. Israel,Jason H. Moore
Cancer Informatics , 2007,
Abstract: The biological interpretation of gene expression microarray results is a daunting challenge. For complex diseases such as cancer, wherein the body of published research is extensive, the incorporation of expert knowledge provides a useful analytical framework. We have previously developed the Exploratory Visual Analysis (EVA) software for exploring data analysis results in the context of annotation information about each gene, as well as biologically relevant groups of genes. We present EVA as a fl exible combination of statistics and biological annotation that provides a straightforward visual interface for the interpretation of microarray analyses of gene expression in the most commonly occurring class of brain tumors, glioma. We demonstrate the utility of EVA for the biological interpretation of statistical results by analyzing publicly available gene expression profi les of two important glial tumors. The results of a statistical comparison between 21 malignant, high-grade glioblastoma multiforme (GBM) tumors and 19 indolent, low-grade pilocytic astrocytomas were analyzed using EVA. By using EVA to examine the results of a relatively simple statistical analysis, we were able to identify tumor class-specifi c gene expression patterns having both statistical and biological signifi cance. Our interactive analysis highlighted the potential importance of genes involved in cell cycle progression, proliferation, signaling, adhesion, migration, motility, and structure, as well as candidate gene loci on a region of Chromosome 7 that has been implicated in glioma. Because EVA does not require statistical or computational expertise and has the fl exibility to accommodate any type of statistical analysis, we anticipate EVA will prove a useful addition to the repertoire of computational methods used for microarray data analysis. EVA is available at no charge to academic users and can be found at http://www.epistasis.org.
Page 1 /156871
Display every page Item

Copyright © 2008-2017 Open Access Library. All rights reserved.