The genome-wide association study (GWAS) is a powerful experimental
design that is applied to detect disease susceptible genetic variants. The main
goal of these studies is to provide a better understanding of the biology of
disease, which further facilitates prevention or better treatment. A
statistical inferential process is finally carried out in this study, where an
association is usually observed between the single-nucleotide polymorphism
(SNPs) and the traits in a case-control setting. To detect the disease
responsible loci correctly, the investigation of the statistical association
should be carefully conducted along with the other necessary steps. This
research provides an introductory guideline for conducting such statistical
association tests for these studies using SNP genotype data.
References
[1]
Tängdén, T., Gustafsson, S., Rao, A.S. and Ingelsson, E. (2022) A Genome-Wide Association Study in a Large Community-Based Cohort Identifies Multiple Loci Associated with Susceptibility to Bacterial and Viral Infections. Scientific Reports, 12, Article No. 2582. https://doi.org/10.1038/s41598-022-05838-z
[2]
Uffelmann, E., Huang, Q.Q., Munung, N.S., Vries, J., Okada, Y., Martin, A.R., Martin, H.C., Lappalainen, T. and Posthuma, D. (2021) Genome-Wide Association Studies. Nature Reviews Methods Primers, 1, Article No. 59.
https://doi.org/10.1038/s43586-021-00056-9
[3]
Loos, R.J.F. (2020) 15 Years of Genome-Wide Association Studies and No Signs of Slowing Down. Nature Communications, 11, Article No. 5900.
https://doi.org/10.1038/s41467-020-19653-5
[4]
Beck, T., Shorter, T. and Brookes, A.J. (2020) GWAS Central: A Comprehensive Resource for the Discovery and Comparison of Genotype and Phenotype Data from Genome-Wide Association Studies. Nucleic Acids Research, 48, D933-D940.
https://doi.org/10.1093/nar/gkz895
[5]
Patron, J., Serra-Cayuela, A., Han, B., Li, C. and Wishart, D.S. (2019) Assessing the Performance of Genome-Wide Association Studies for Predicting Disease Risk. PLoS ONE, 14, e0220215. https://doi.org/10.1371/journal.pone.0220215
[6]
Marees, A.T., Kluiver, H.D., Stringer, S., Vorspan, F., Curis, E., Marie-Claire, C. and Derks, E.M. (2017) A Tutorial on Conducting Genome-Wide Association Studies: Quality Control and Statistical Analysis. International Journal of Methods in Psychiatric Research, 27, e1608. https://doi.org/10.1002/mpr.1608
[7]
The International HapMap Consortium (2003) The International HapMap Project. Nature, 426, 789-796. https://doi.org/10.1038/nature02168
[8]
The 1000 Genomes Project Consortium (2010) A Map of Human Genome Variation from Population Scale Sequencing. Nature, 467, 1061-1073.
https://doi.org/10.1038/nature09534
Laurie, C.C., Doheny, K.F., Mirel, D.B., Pugh, E.W., Bierut, L.J., Bhangale, T., Boehm, F., Caporaso, N.E., Cornelis, M.C., Edenberg, H.J., Gabriel, S.B., Harris, E.L., Hu, F.B., Jacobs, K., Kraft, P., Landi, M.T., Lumley, T., Manolio, T.A., McHugh, C., Painter, I., Paschall, J., Rice, J.P., Rice, K.M., Zheng, X. and Weir, B.S. (2010) Quality Control and Quality Assurance in Genotypic Data for Genome-Wide Association Studies. Genetic Epidemiology, 34, 591-602.
https://doi.org/10.1002/gepi.20516
[11]
Reed, E., Nunez, S., Kulp, D., Qian, J., Reilly, M.P. and Foulkesa, A.S. (2015) A Guide to Genome-Wide Association Analysis and Post-Analytic Interrogation. Statistics in Medicine, 34, 3769-3792. https://doi.org/10.1002/sim.6605
[12]
Setu, T.J. and Basak, T. (2021) An Introduction to Basic Statistical Models in Genetics. Open Journal of Statistics, 11, 1017-1025.
https://doi.org/10.4236/ojs.2021.116060
[13]
Plackett, R.L. (1983) Karl Pearson and the Chi-Squared Test. International Statistical Review, 51, 59-72. https://www.jstor.org/stable/1402731
https://doi.org/10.2307/1402731
[14]
Moore, J.H., Hahn, L.W., Ritchie, M.D., Thornton, T.A. and White, B.C. (2004) Routine Discovery of Complex Genetic Models Using Genetic Algorithms. Applied Soft Computing, 4, 79-86. https://doi.org/10.1016/j.asoc.2003.08.003
[15]
Cooper, D.N., Krawczak, M., Polychronakos, C., Tyler-Smith, C. and Kehrer-Sawatzk, H. (2013) Where Genotype Is Not Predictive of Phenotype: Towards an Understanding of the Molecular Basis of Reduced Penetrance in Human Inherited Disease. Human Genetics, 132, 1077-1130. https://doi.org/10.1007/s00439-013-1331-2
[16]
Ford, D., Easton, D.F., Stratton, M., Narod, S., Goldgar, D., Devilee, P., Bishop, D.T., Weber, B., Lenoir, G., Chang-Claude, J., Sobol, H., Teare, M.D., Struewing, J., Arason, A., Scherneck, S., Peto, J., Rebbeck, T.R., Tonin, P., Neuhausen, S., Barkardottir, R., Eyfjord, J., Lynch, H., Ponder, B.A.J., Gayther, S.A., Birch, J.M., Lindblom, A., Stoppa-Lyonnet, D., Bignon, Y., Borg, A., Hamann, U., Haites, N., Scott, R.J., Maugard, C.M., Vasen, H., Seitz, S., Cannon-Albright, L.A., Schofield, A., Zelada-Hedman, M. and The Breast Cancer Linkage Consortium (1998) Genetic Heterogeneity and Penetrance Analysis of the BRCA1 and BRCA2 Genes in Breast Cancer Families. American Journal of Human Genetics, 62, 676-689.
https://doi.org/10.1086/301749
[17]
Ziegler, A. and König, I.R. (2010) A Statistical Approach to Genetic Epidemiology: Concepts and Applications. Wiley-VCH, Weinheim.
https://www.10.1002/9783527633654
[18]
Gong, G., Hannon, N. and Whittemore, A.S. (2010) Estimating Gene Penetrance from Family Data. Genetic Epidemiology, 34, 373-381.
https://doi.org/10.1002/gepi.20493
[19]
Bush, W.S. and Moore, J.H. (2012) Chapter 11: Genome-Wide Association Studies. PLOS Computational Biology, 8, e1002822.
https://doi.org/10.1371/journal.pcbi.1002822
[20]
Bagos, P.G. (2013) Genetic Model Selection in Genome-Wide Association Studies: Robust Methods and the Use of Meta-Analysis. Statistical Applications in Genetics and Molecular Biology, 12, 285-308. https://doi.org/10.1515/sagmb-2012-0016
[21]
Joo, J., Kwak, M. and Zheng, G. (2010) Improving Power for Testing Genetic Association in Case-Control Studies by Reducing the Alternative Space. Biometrics, 66, 266-276. https://doi.org/10.1111/j.1541-0420.2009.01241.x
[22]
Horita, N. and Kaneko, T. (2015) Genetic Model Selection for a Case-Control Study and a Meta-Analysis. Meta Gene, 5, 1-8.
https://doi.org/10.1016/j.mgene.2015.04.003
[23]
Armitage, P. (1955) Tests for Linear Trends in Proportions and Frequencies. Biometrics, 11, 375-386. https://www.jstor.org/stable/3001775
https://doi.org/10.2307/3001775
[24]
Cochran, W.G. (1954) Some Methods for Strengthening the Common Chi-Squared Test. Biometrics, 10, 417-451. https://www.jstor.org/stable/3001616
https://doi.org/10.2307/3001616
[25]
Pascovici, D., Handler, D.C.L., Wu, J.X. and Haynes, P.A. (2016) Multiple Testing Corrections in Quantitative Proteomics: A Useful but Blunt Tool. Proteomics, 16, 2448-2453. https://doi.org/10.1002/pmic.201600044
[26]
Noble, W.S. (2009) How Does Multiple Testing Correction Work? Nature Biotechnology, 27, 1135-1137. https://doi.org/10.1038/nbt1209-1135
[27]
Gibson, J., Russ, T.C., Clarke, T.K., Howard, D.M., Hillary, R.F., Evans, K.L., Walker, R.M., Bermingham, M.L., Morris, S.W., Campbell, A., Hayward, C., Murray, A.D., Porteous, D.J., Horvath, S., Lu, A.T., McIntosh, A.M., Whalley, H.C. and Marioni, R.E. (2019) A Meta-Analysis of Genome-Wide Association Studies of Epigenetic Age Acceleration. PLOS Genetics, 15, e1008104.
https://doi.org/10.1371/journal.pgen.1008104
[28]
R Development Core Team (2008) R: A Language and Environment for Statistical Computing. Reference Index: R Foundation for Statistical Computing.
http://softlibre.unizar.es/manuales/aplicaciones/r/fullrefman.pdf
[29]
Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A.R., Bender, D., Maller, J., Sklar, P., Bakker, P.I.W.de., Daly, M.J. and Sham, P.C. (2007) PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. The American Journal of Human Genetics, 81, 559-575.
https://doi.org/10.1086/519795