|
BMC Genetics 2011
Choice of population structure informative principal components for adjustment in a case-control studyAbstract: We found that when the SNP and phenotype frequencies do not vary over the sub-populations, all methods of selection provided similar power and appropriate Type I error for association. When the SNP is not structured and the phenotype has large structure, then selection methods that do not select PCs for inclusion as covariates generally provide the most power. When there is a structured SNP and a non-structured phenotype, selection methods that include PCs in the model have greater power. When both the SNP and the phenotype are structured, all methods of selection have similar power.Standard practice is to include a fixed number of PCs in genome-wide association studies. Based on our findings, we conclude that if power is not a concern, then selecting the same set of top PCs for adjustment for all SNPs in logistic regression is a strategy that achieves appropriate Type I error. However, standard practice is not optimal in all scenarios and to optimize power for structured SNPs in the presence of unstructured phenotypes, PCs that are associated with the tested SNP should be included in the logistic model.The principal components (PCs) of genome-wide genotype data can be used to detect and adjust for population structure in genetic association analyses [1,2]. The popularity of the PC method is evident by its wide use: it has been cited by over 400 publications. However, the choice of which PCs to use and the best way to adjust for the PCs in analyses of dichotomous traits is not yet clear.Numerous methods have been proposed to adjust for structure once PCs are computed (Table 1). The simplest and most straightforward approach is to adjust for continuous PCs in a regression model. Kimmel et al [3] note that principal component analysis (PCA) is sufficient for identifying population structure, but adjusting for PCs as covariates in a model may not always eliminate false positive associations since the PCs are only an estimate of the population structure. Furthermore, Yu
|