|
BMC Genetics 2011
A strategy analysis for genetic association studies with known inbreedingAbstract: We have evidence, from statistical theory, simulations and two applications, that we build a suitable procedure to eliminate stratification between cases and controls and that it also has enough precision in identifying genetic variants responsible for a disease. This procedure has been successfully used for the beta-thalassemia, which is a well known Mendelian disease, and also to the common asthma where we have identified candidate genes that underlie to the susceptibility of the asthma. Some of such candidate genes have been also found related to common asthma in the current literature.The data analysis approach, based on selecting the most related cases and controls along with the Random Forest model, is a powerful tool for detecting genetic variants associated to a disease in isolated populations. Moreover, this method provides also a prediction model that has accuracy in estimating the unknown disease status and that can be generally used to build kit tests for a wide class of Mendelian diseases.One of the main objectives in studying the genetics of complex diseases is not only the search of genetic variants associated to pathologies [1], but also to build predictive models which help both their diagnosis and early treatment. This problem can be formalized by expressing the disease status, Y, of each subject as a Bernoulli random variable Y = {0, 1} where Y = 1 indicates an affected subject. The main quantity of interest, F(x) = Pr(Y = 1|x), is the conditional probability of being affected given a set x of genetic variants and environmental variables. Such variables form a huge set of potential predictors, which we will refer to as omic profile. Essentially, , where P is the number of considered genetic variants and environmental variables. Specifically, we use a sample of N ? P where N is of order of hundreds while P is thousands times larger. This setup complicates the estimation of F(x), because, in absence of strong prior information [2] on the part of the
|