|
BMC Medical Genetics 2011
Genome Wide Association Study to predict severe asthma exacerbations in children using random forests classifiersAbstract: In this study, using emergency room visits or hospitalizations as the definition of a severe asthma exacerbation, we first identified a list of top Genome Wide Association Study (GWAS) SNPs ranked by Random Forests (RF) importance score for the CAMP (Childhood Asthma Management Program) population of 127 exacerbation cases and 290 non-exacerbation controls. We predict severe asthma exacerbations using the top 10 to 320 SNPs together with age, sex, pre-bronchodilator FEV1 percentage predicted, and treatment group.Testing in an independent set of the CAMP population shows that severe asthma exacerbations can be predicted with an Area Under the Curve (AUC) = 0.66 with 160-320 SNPs in comparison to an AUC score of 0.57 with 10 SNPs. Using the clinical traits alone yielded AUC score of 0.54, suggesting the phenotype is affected by genetic as well as environmental factors.Our study shows that a random forests algorithm can effectively extract and use the information contained in a small number of samples. Random forests, and other machine learning tools, can be used with GWAS studies to integrate large numbers of predictors simultaneously.Personalized medicine, the ability to predict an individual's predisposition to disease and response to therapy with genetic and phenotypic characteristics, promises to deliver more efficient health outcomes [1-4]. As a field, personalized medicine faces multiple issues when trying to predict complex diseases such as cardiovascular diseases, cancer, and asthma. This is largely due to the fact that no single genotypic or phenotypic characteristic can explain more than a small portion of any complex disease. Instead, complex diseases are influenced by multiple genetic factors and environmental exposures. For instance, the height of a person is considered to be strongly heritable, but the top 20 single nucleotide polymorphisms (SNPs) chosen by p value, explain only ~2-3% of the variability in adult height [5]. In addition to the multitude o
|