|
Human Genomics 2010
A survey of analysis software for array-comparative genomic hybridisation studies to detect copy number variationDOI: 10.1186/1479-7364-4-6-421 Keywords: copy number variants, CNV, deletion, insertion, duplication, aCGH Abstract: Copy number variants (CNVs) are DNA sequences that are present in different amounts among individuals in a population. Copy number differences can confer a change in gene expression, phenotypic variation, disease susceptibility,[1-5] and gene and genome evolution [6,7]. Repetitive sequences that flank a specific genomic region can further facilitate a duplication or deletion of that region via the mechanism of non-allelic homologous recombination, which can occur when paralogous sequences in the genome mis-pair during meiois [8-10]. A key method used to study CNVs across individuals is that of array-based comparative genomic hybridisation (aCGH). The goal of aCGH experiments is to detect and compare the copy numbers of DNA sequences at high resolution along the genome. Several informatics tools currently exist for accurate and efficient CNV detection and assessment. These tools assist in automated analysis of array CGH data and user-friendly copy number reporting for individual samples. The goal of the statistical algorithms used in these software programs is to call aberrations reliably, accurately and precisely.The analysis of CNVs is broken down into several steps, including: (i) pre-processing and normalisation of the raw data; (ii) aligning data with its genome location, conducting segmentation analysis and providing statistical analysis to ensure the reliability of detection; and (iii) post-processing to assign biological meaning to the different states.(i) Normalisation of the log2 ratios is typically conducted in an attempt to adjust for sources of systematic variation. Since these effects are often not known or measured, most aCGH methodologies incorporate global normalisation techniques, centring the data about the sample mean or median for a given hybridisation [11]. Normalisation remains imperfect, and an accurate estimation of the copy number is unlikely. It is assumed, however, that changes in the observed, normalised log2 ratios correspond directly to
|