%0 Journal Article %T Data mining of high density genomic variant data for prediction of Alzheimer's disease risk %A Natalia Briones %A Valentin Dinu %J BMC Medical Genetics %D 2012 %I BioMed Central %R 10.1186/1471-2350-13-7 %X Two different approaches were devised to select SNPs associated with LOAD in a publicly available GWAS data set consisting of three cohorts. In both approaches, single-locus analysis (logistic regression) was conducted to filter the data with a less conservative p-value than the Bonferroni threshold; this resulted in a subset of SNPs used next in multi-locus analysis (random forest (RF)). In the second approach, we took into account prior biological knowledge, and performed sample stratification and linkage disequilibrium (LD) in addition to logistic regression analysis to preselect loci to input into the RF classifier construction step.The first approach gave 199 SNPs mostly associated with genes in calcium signaling, cell adhesion, endocytosis, immune response, and synaptic function. These SNPs together with APOE and GAB2 SNPs formed a predictive subset for LOAD status with an average error of 9.8% using 10-fold cross validation (CV) in RF modeling. Nineteen variants in LD with ST5, TRPC1, ATG10, ANO3, NDUFA12, and NISCH respectively, genes linked directly or indirectly with neurobiology, were identified with the second approach. These variants were part of a model that included APOE and GAB2 SNPs to predict LOAD risk which produced a 10-fold CV average error of 17.5% in the classification modeling.With the two proposed approaches, we identified a large subset of SNPs in genes mostly clustered around specific pathways/functions and a smaller set of SNPs, within or in proximity to five genes not previously reported, that may be relevant for the prediction/understanding of AD.It is predicted the number of people who suffer from Alzheimer's disease (AD) will increase from 5 million to 13.4 million in the United States of America and will be 115.4 million worldwide by 2050 [1,2]. There is currently no treatment to stop or reverse the progress of this disease. This neurodegenerative disorder is believed to be caused by an inability to clear ¦Â-amyloid (increasing all it %K Late-Onset Alzheimer's Disease %K GWAS %K SNPs %K Random Forest %U http://www.biomedcentral.com/1471-2350/13/7