Motivation The discovery that copy number variants (CNVs) are widespread in the human genome has motivated development of numerous algorithms that attempt to detect CNVs from intensity data. However, all approaches are plagued by high false discovery rates. Further, because CNVs are characterized by two dimensions (length and intensity) it is unclear how to order called CNVs to prioritize experimental validation. Results We developed a univariate score that correlates with the likelihood that a CNV is true. This score can be used to order CNV calls in such a way that calls having larger scores are more likely to overlap a true CNV. We developed cnv.beast, a computationally efficient algorithm for calling CNVs that uses robust backward elimination regression to keep CNV calls with scores that exceed a user-defined threshold. Using an independent dataset that was measured using a different platform, we validated our score and showed that our approach performed better than six other currently-available methods. Availability cnv.beast is available at http://www.duke.edu/~asallen/Software.ht?ml.
References
[1]
Wang K, Li M, Hadley D, Liu R, Glessner J, et al. (2007) An integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Research 17: 1665–1674. doi: 10.1101/gr.6861907
[2]
Hsu L, Self S, Grove D, Randolph T, Wang K, et al. (2005) Denoising array-based comparative genomic hybridization data using wavelets. Biostatistics 9: 211–226. doi: 10.1093/biostatistics/kxi004
[3]
Huang J, Gusnanto A, O’Sullivan K, Staaf J, Borg A, et al. (2007) Robust smooth segmentation approach for array CGH data analysis. Bioinformatics 23: 2463–2469. doi: 10.1093/bioinformatics/btm359
[4]
Xing B, Greenwood CMT, Bull SB (2007) A hierarchical clustering method for estimating copy number variation. Biostatistics 8: 632–653. doi: 10.1093/biostatistics/kxl035
[5]
Ely B (2009) A comparison of methods for detecting copy number variants from single nucleotide polymporphism intensity data. Masters Thesis, Department of Biostatistics, University of Washington, Seattle WA.
[6]
Jeng XJ, Cai TT, Li H (2010) Optimal sparse segment identification with application in copy number variation analysis. Journal of the American Statistical Assocociation 105: 1156–1166. doi: 10.1198/jasa.2010.tm10083
[7]
Hupe P, Stransky N, Thiery J, Radvanyi F, Barillot E (2004) Analysis of array CGH data. Bioinformatics 20: 3413–3422. doi: 10.1093/bioinformatics/bth418
[8]
Huang T, Wu B, Lizardi P, Zhao H (2005) Detection of DNA copy number alterations using penalized least squares regression. Bioinformatics 21: 3811–3817. doi: 10.1093/bioinformatics/bti646
[9]
Tibshirani R, Wang P (2008) Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics 9: 18–29. doi: 10.1093/biostatistics/kxm013
[10]
Kidd JM, Cooper GM, Donahue WF, Hayden HS, Sampas N, et al. (2008) Mapping and sequencing of structural variation from eight human genomes. Nature 453: 56–64. doi: 10.1038/nature06862
[11]
Geschwind DH, Sowinski J, Lord C, Iversen P, Shestack J, et al. (2001) The Autism Genetic Resource Exchange: a resource for the study of autism and related neuropsychiatric conditions. American Journal of Human Genetetics 69: 463–6. doi: 10.1086/321292
[12]
Olshen A, Venkatraman E, Lucito R, Wigler M (2004) Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5: 557–572. doi: 10.1093/biostatistics/kxh008
[13]
Fridlyand J, Snijders A, Pinkel D, Albertson D, Jain A (2004) Hidden Markov models approach to the analysis of array CGH data. Journal of Multivariate Analysis 90: 132–153. doi: 10.1016/j.jmva.2004.02.008
[14]
Picard F, Robin S, Lebarbier E, Daudin J (2007) A Segmentation/Clustering Model for the Analysis of Array CGH Data. Biometrics 63: 758–766. doi: 10.1111/j.1541-0420.2006.00729.x
[15]
Mulle JG, Dodd AF, McGrath JA, Wolyniec PS, Mitchell AA, et al. (2010) Microdeletions in 3q29 Confer High Risk of Schizophrenia. American Journal of Human Genetics 87: 229–236. doi: 10.1016/j.ajhg.2010.07.013
[16]
Moreno-De-Luca D, SGENE Consortium, Mulle JG, Simons Simplex Collection Genetics Consortium, Kaminsky EB, et al. (2010) Deletion 17q12 is a Recurrent Copy Number Variant that Confers High Risk of Autism and Schizophrenia. American Journal of Human Genetics 87: 618–30. Erratum in: American Journal of Human Genetics 88: 121.
[17]
Mulle JG, Pulver AE, McGrath JA, Wolyniec PS Dodd AF, et al.. (2013) Reciprocal Duplication of the Williams-Beuren Syndrome Deletion on Chromosome 7q11.23 is Associated with Schizophrenia. Biological Psychiatry in press.
[18]
Satten GA, Ramachandran D, Mulle JG, Allen AS, Bean LJH, et al.. (2012) Testing Copy Number Variant/Trait Associations Detected Using Manhattan Plots. American Society for Human Genetics Abstract # 1349W.