Genome-wide association studies (GWASs) in identifying the disease-associated genetic variants have been proved to be a great pioneering work. Two-stage design and analysis are often adopted in GWASs. Considering the genetic model uncertainty, many robust procedures have been proposed and applied in GWASs. However, the existing approaches mostly focused on binary traits, and few work has been done on continuous (quantitative) traits, since the statistical significance of these robust tests is difficult to calculate. In this paper, we develop a powerful -statistic-based robust joint analysis method for quantitative traits using the combined raw data from both stages in the framework of two-staged GWASs. Explicit expressions are obtained to calculate the statistical significance and power. We show using simulations that the proposed method is substantially more robust than the -test based on the additive model when the underlying genetic model is unknown. An example for rheumatic arthritis (RA) is used for illustration. 1. Introduction Genome-wide association studies (GWASs) have identified a large number of genomic regions (especially single-nucleotide polymorphisms (SNPs)) with a wide variety of complex traits/diseases. In a GWAS, two most common types of data, qualitative (or binary) and quantitative (or continuous) traits, are analyzed and two contentious points are often faced; one is how to construct the test statistic considering the genetic model uncertainty and the other is how to evaluate the statistical significance for controlling the false positive rates efficiently (e.g., [1, 2]). Considering these issues, a lot of work has been done on the binary trait in the past 10 years (e.g., [3–7]). Computer algorithms have also been developed to calculated the significance level of robust tests in GWASs, taking into account the genetic model uncertainty [8]. However, few work has been done on continuous traits, only recently So and Sham [9] proposed a MAX3 based on score test statistics, and Li et al. [10] gave a MAX3 based on -test statistics. Note that these tests just focus on single-marker analysis in one-stage analysis. Although the costs of whole-genome genotyping are decreasing with the high-throughput biological technology, the total costs for a GWAS are still very expensive due to the thousands of sampling units and huge amounts of single-nucleotide polymorphisms. In order to save the costs, the two-stage design and the corresponding statistical analysis where all the SNPs are genotyped in Stage 1 on a portion of the samples and the promising
References
[1]
B. Freidlin, G. Zheng, Z. Li, and J. L. Gastwirth, “Trend tests for case-control studies of genetic markers: power, sample size and robustness,” Human Heredity, vol. 53, no. 3, pp. 146–152, 2002.
[2]
J. D. Storey and R. Tibshirani, “Statistical significance for genomewide studies,” Proceedings of the National Academy of Sciences of the United States of America, vol. 100, no. 16, pp. 9440–9445, 2003.
[3]
K. Song and R. C. Elston, “A powerful method of combining measures of association and Hardy-Weinberg disequilibrium for fine-mapping in case-control studies,” Statistics in Medicine, vol. 25, no. 1, pp. 105–126, 2006.
[4]
G. Zheng and J. L. Gastwirth, “On estimation of the variance in Cochran-Armitage trend tests for genetic association using case-control studies,” Statistics in Medicine, vol. 25, no. 18, pp. 3150–3159, 2006.
[5]
Q. Li, K. Yu, Z. Li, and G. Zheng, “MAX-rank: a simple and robust genome-wide scan for case-control association studies,” Human Genetics, vol. 123, no. 6, pp. 617–623, 2008.
[6]
Q. Li, G. Zheng, X. Liang, and K. Yu, “Robust tests for single-marker analysis in case-control genetic association studies,” Annals of Human Genetics, vol. 73, no. 2, pp. 245–252, 2009.
[7]
Y. Zang and W. K. Fung, “Robust tests for matched case-control genetic association studies,” BMC Genetics, vol. 11, article 91, 2010.
[8]
Y. Zang, W. K. Fung, and G. Zheng, “Simple algorithms to calculate asymptotic null distributions of robust tests in case-control genetic association studies in R,” Journal of Statistical Software, vol. 33, no. 8, pp. 1–24, 2010.
[9]
H. C. So and P. C. Sham, “Robust association tests under different genetic models, allowing for binary or quantitative traits and covariates,” Behavior Genetics, vol. 41, no. 5, pp. 768–775, 2011.
[10]
Q. Li, W. Xiong, J. B. Chen et al., “A robust test for quantitative trait analysis with model uncertainty in genetic association studies,” Statistics and Its Interface. In press.
[11]
J. M. Satagopan, E. S. Venkatraman, and C. B. Begg, “Two-stage designs for gene-disease association studies with sample size constraints,” Biometrics, vol. 60, no. 3, pp. 589–597, 2004.
[12]
A. D. Skol, L. J. Scott, G. R. Abecasis, and M. Boehnke, “Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies,” Nature Genetics, vol. 38, no. 2, pp. 209–213, 2006.
[13]
K. Yu, N. Chatterjee, W. Wheeler et al., “Flexible design for following up positive findings,” American Journal of Human Genetics, vol. 81, no. 3, pp. 540–551, 2007.
[14]
R. Sladek, G. Rocheleau, J. Rung et al., “A genome-wide association study identifies novel risk loci for type 2 diabetes,” Nature, vol. 445, no. 7130, pp. 881–885, 2007.
[15]
D. Pan, Q. Li, N. Jiang, A. Liu, and K. Yu, “Robust joint analysis allowing for model uncertainty in two-stage genetic association studies,” BMC Bioinformatics, vol. 12, article 9, 2011.
[16]
A. J. Silman and J. E. Pearson, “Epidemiology and genetics of rheumatoid arthritis,” Arthritis Research & Therapy, vol. 4, supplement 3, pp. S265–S272, 2002.
[17]
A. J. MacGregor, H. Snieder, A. S. Rigby et al., “Characterizing the quantitative genetic contribution to rheumatoid arthritis using data from twins,” Arthritis & Rheumatism, vol. 43, no. 1, pp. 30–37, 2000.
[18]
C. I. Amos, W. V. Chen, M. F. Seldin et al., “Data for Genetic Analysis Workshop 16 Problem 1, association analysis of rheumatoid arthritis data,” BMC Proceedings, vol. 3, supplement 7, article S2, 2009.
[19]
F. Xia, J. Y. Zhou, and W. K. Fung, “A powerful approach for association analysis incorporating imprinting effects,” Bioinformatics, vol. 27, no. 18, pp. 2571–2577, 2011.
[20]
G. Zheng, C. O. Wu, M. Kwak, W. Jiang, J. Joo, and J. A. C. Lima, “Joint analysis of binary and quantitative traits with data sharing and outcome-dependent sampling,” Genetic Epidemiology, vol. 36, no. 3, pp. 263–273, 2012.
[21]
T. W. J. Huizinga, C. I. Amos, A. H. M. van der Helm-Van Mil et al., “Refining the complex rheumatoid arthritis phenotype based on specificity of the HLA-DRB1 shared epitope for antibodies to citrullinated proteins,” Arthritis & Rheumatism, vol. 52, no. 11, pp. 3433–3438, 2005.
[22]
L. Chen, M. Zhong, W. V. Chen, C. I. Amos, and R. Fan, “A genome-wide association scan for rheumatoid arthritis data by Hotellings tests,” BMC Proceedings, vol. 3, supplement 7, article S6, 2009.