We investigated the ability of several principal components analysis (PCA)-based strategies to detect and control for population stratification using data from a multi-center study of epithelial ovarian cancer among women of European-American ethnicity. These include a correction based on an ancestry informative markers (AIMs) panel designed to capture European ancestral variation and corrections utilizing un-thinned genome-wide SNP data; case-control samples were drawn from four geographically distinct North-American sites. The AIMs-only and genome-wide first principal components (PC1) both corresponded to the previously described North or Northwest-Southeast axis of European variation. We found that the genome-wide PCA captured this primary dimension of variation more precisely and identified additional axes of genome-wide variation of relevance to epithelial ovarian cancer. Associations evident between the genome-wide PCs and study site corroborate North American immigration history and suggest that undiscovered dimensions of variation lie within Northern Europe. The structure captured by the genome-wide PCA was also found within control individuals and did not reflect the case-control variation present in the data. The genome-wide PCA highlighted three regions of local LD, corresponding to the lactase (LCT) gene on chromosome 2, the human leukocyte antigen system (HLA) on chromosome 6 and to a common inversion polymorphism on chromosome 8. These features did not compromise the efficacy of PCs from this analysis for ancestry control. This study concludes that although AIMs panels are a cost-effective way of capturing population structure, genome-wide data should preferably be used when available.
References
[1]
Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, et al. (2007) A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 316: 1341–1345.
[2]
Saxena R, Voight BF, Lyssenko V, Burtt NP, de Bakker PIW, et al. (2007) Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 316: 1331–1336.
[3]
Sladek R, Rocheleau G, Rung J, Dina C, Shen L, et al. (2007) A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature 445: 881–885.
[4]
Burton PR, Clayton DG, Cardon LR, Craddock N, Deloukas P, et al. (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447: 661–678.
[5]
Marchini J, Cardon LR, Phillips MS, Donnelly P (2004) The effects of human population structure on large genetic association studies. Nature Genetics 36: 512–517.
[6]
Ziv E, Burchard EG (2003) Human population structure and genetic association studies. Pharmacogenomics 4: 431–441.
[7]
Bacanu SA, Devlin B, Roeder K (2002) Association studies for quantitative traits in structured populations. Genetic Epidemiology 22: 78–93.
[8]
Zhu XF, Li SC, Cooper RS, Elston RC (2008) A unified association analysis approach for family and unrelated samples correcting for stratification. American Journal of Human Genetics 82: 352–365.
[9]
Tiwari HK, Barnholtz-Sloan J, Wineinger N, Padilla MA, Vaughan LK, et al. (2008) Review and evaluation of methods correcting for population stratification with a focus on underlying statistical principles. Human Heredity 66: 67–86.
[10]
Barnholtz-Sloan JS, McEvoy B, Shriver MD, Rebbeck TR (2008) Ancestry estimation and correction for population stratification in molecular epidemiologic association studies. Cancer Epidemiology Biomarkers & Prevention 17: 471–477.
[11]
Zhu XF, Zhang SL, Zhao HY, Cooper RS (2002) Association mapping, using a mixture model for complex traits. Genetic Epidemiology 23: 181–196.
[12]
Seldin MF, Price AL (2008) Application of ancestry informative markers to association studies in European Americans. Plos Genetics 4: 3.
[13]
Seldin MF, Shigeta R, Villoslada P, Selmi C, Tuomilehto J, et al. (2006) European population substructure: Clustering of northern and southern populations. Plos Genetics 2: 1339–1351.
[14]
Bauchet M, McEvoy B, Pearson LN, Quillen EE, Sarkisian T, et al. (2007) Measuring European population stratification with microarray genotype data. American Journal of Human Genetics 80: 948–956.
[15]
Tian C, Plenge RM, Ransom M, Lee A, Villoslada P, et al. (2008) Analysis and application of European genetic substructure using 300 KSNP information. Plos Genetics 4: 11.
[16]
Price AL, Butler J, Patterson N, Capelli C, Pascali VL, et al. (2008) Discerning the ancestry of European Americans in genetic association studies. Plos Genetics 4: 9.
[17]
Paschou P, Drineas P, Lewis J, Nievergelt CM, Nickerson DA, et al. (2008) Tracing Sub-Structure in the European American Population with PCA-Informative Markers. Plos Genetics 4: 13.
[18]
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, et al. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics 38: 904–909.
[19]
Permuth-Wey J, Chen YA, Tsai Y-Y, Chen Z, Qu X, et al. (2011) Inherited Variants in Mitochondrial Biogenesis Genes May Influence Epithelial Ovarian Cancer Risk. Cancer Epidemiology Biomarkers & Prevention.
Campbell CD, Ogburn EL, Lunetta KL, Lyon HN, Freedman ML, et al. (2005) Demonstrating stratification in a European American population. Nature Genetics 37: 868–872.
[22]
de Bakker PIW, McVean G, Sabeti PC, Miretti MM, Green T, et al. (2006) A high-resolution HLA and SNP haplotype map for disease association studies in the extended human MHC. Nature Genetics 38: 1166–1172.
[23]
Fellay J, Shianna KV, Ge DL, Colombo S, Ledergerber B, et al. (2007) A whole-genome association study of major determinants for host control of HIV-1. Science 317: 944–947.
[24]
Broman KW, Matsumoto N, Giglio S, Martin CL, Roseberry JA, et al. (2003) Common long human inversion polymorphism on chromosome 8p. Statistics and Science: a Festschrift for Terry Speed. Hayward: Inst Mathematical Statistics. pp. 237–245.
[25]
Zou F, Lee S, Knowles MR, Wright FA (2010) Quantification of Population Structure Using Correlated SNPs by Shrinkage Principal Components. Human Heredity 70: 9–22.
[26]
Drineas P, Lewis J, Paschou P (2010) Inferring Geographic Coordinates of Origin for Europeans Using Small Panels of Ancestry Informative Markers. PLoS ONE 5: e11892.
“When were the major waves of immigration to the United States?” History AnswersVisibleInkPress (2005) Answers.com. 17: Available: http://www.answers.com/topic/when-were-t?he-major-waves-of-immigration-to-the-uni?ted-states. Accessed 2011 June.
[29]
“History of Immigration”TorontoHistoryGeography2UI3S?ection4 (2010) 17: Available: http://libwiki.mcmaster.ca/geo2ui3-secti?on4/index.php/Toronto/History. Accessed 2011 June.