全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Comparison of Merging and Meta-Analysis as Alternative Approaches for Integrative Gene Expression Analysis

DOI: 10.1155/2014/345106

Full-Text   Cite this paper   Add to My Lib

Abstract:

An increasing amount of microarray gene expression data sets is available through public repositories. Their huge potential in making new findings is yet to be unlocked by making them available for large-scale analysis. In order to do so it is essential that independent studies designed for similar biological problems can be integrated, so that new insights can be obtained. These insights would remain undiscovered when analyzing the individual data sets because it is well known that the small number of biological samples used per experiment is a bottleneck in genomic analysis. By increasing the number of samples the statistical power is increased and more general and reliable conclusions can be drawn. In this work, two different approaches for conducting large-scale analysis of microarray gene expression data—meta-analysis and data merging—are compared in the context of the identification of cancer-related biomarkers, by analyzing six independent lung cancer studies. Within this study, we investigate the hypothesis that analyzing large cohorts of samples resulting in merging independent data sets designed to study the same biological problem results in lower false discovery rates than analyzing the same data sets within a more conservative meta-analysis approach. 1. Introduction Nowadays, an increasing amount of gene expression data sets is available through public repositories (e.g., NCBI GEO [1], ArrayExpress [2]), which might contain the necessary clues for the discovery of new findings, leading to the development of new treatments or therapies. It is one of the most recent challenges to unlock the hidden potential of these data, by using it in large-scale analysis pipe-lines. Integrating this vast amount of data originating from different but independent studies could be beneficial for the discovery of new biological insights by increasing the statistical power of gene expression analysis [3, 4]. With integrative analysis we mean combining the information of multiple and independent studies, designed to study the same biological problem, in order to extract more general and more reliable conclusions. To this purpose, two approaches exist: meta-analysis and analysis by data merging. In the meta-analysis approach the results of individual studies (e.g., values, ranks, classification accuracies, etc.) are combined at the interpretative level. In contrast, the merging approach integrates microarray data at the expression value level after transforming the expression values to numerically comparable measures. Both approaches are illustrated in Figure 1.

References

[1]  T. Barrett, D. B. Troup, S. E. Wilhite et al., “NCBI GEO: archive for functional genomics data sets-10 years on,” Nucleic Acids Research, vol. 39, no. 1, pp. D1005–D1010, 2011.
[2]  H. Parkinson, U. Sarkans, N. Kolesnikov et al., “Arrayexpress update-an archive of microarray and high-throughput sequencing-based functional genomics experiments,” Nucleic Acids Research, vol. 39, no. 1, pp. D1002–D1004, 2011.
[3]  Y. Moreau, S. Aerts, B. De Moor, B. De Strooper, and M. Dabrowski, “Comparison and meta-analysis of microarray data: from the bench to the computer desk,” Trends in Genetics, vol. 19, no. 10, pp. 570–577, 2003.
[4]  O. Larsson and R. Sandberg, “Lack of correct data format and comparability limits future integrative microarray research,” Nature Biotechnology, vol. 24, no. 11, pp. 1322–1323, 2006.
[5]  A. Ramasamy, A. Mondry, C. C. Holmes, and D. G. Altman, “Key issues in conducting a meta-analysis of gene expression microarray datasets,” PLoS Medicine, vol. 5, no. 9, Article ID e184, 2008.
[6]  C. K. Sarmah and S. Samarasinghe, “Microarray data integration: frameworks and a list of underlying issues,” Current Bioinformatics, vol. 5, no. 4, pp. 280–289, 2010.
[7]  J. T. Leek, R. B. Scharpf, H. C. Bravo et al., “Tackling the widespread and critical impact of batch effects in high-throughput data,” Nature Reviews Genetics, vol. 11, no. 10, pp. 733–739, 2010.
[8]  A. Scherer, Ed., Batch Effects and Noise in Microarray Experiments: Sources and Solutions, John Wiley and Sons, New York, NY, USA, 2009.
[9]  L. Xu, A. C. Tan, R. L. Winslow, and D. Geman, “Merging microarray data from separate breast cancer studies provides a robust prognostic test,” BMC Bioinformatics, vol. 9, p. 125, 2008.
[10]  A. Coletta, C. Molter, R. Duque et al., “InSilico DB genomic datasets hub: an efficient starting point for analyzing genome-wide studies in genepattern, integrative genomics viewer, and R/Bioconductor,” Genome Biology, vol. 13, no. 11, p. R104, 2012.
[11]  J. Taminau, D. Steenhoff, A. Coletta et al., “inSilicoDb: an R/bioconductor package for accessing human Affymetrix expert-curated datasets from GEO,” Bioinformatics, vol. 27, no. 22, Article ID btr529, pp. 3204–3205, 2011.
[12]  M. N. McCall, B. M. Bolstad, and R. A. Irizarry, “Frozen robust multiarray analysis (fRMA),” Biostatistics, vol. 11, no. 2, pp. 242–253, 2010.
[13]  M. T. Landi, T. Dracheva, M. Rotunno et al., “Gene expression signature of cigarette smoking and its role in lung adenocarcinoma development and survival,” PLoS ONE, vol. 3, no. 2, Article ID e1651, 2008.
[14]  L.-J. Su, C.-W. Chang, Y.-C. Wu et al., “Selection of DDX5 as a novel internal control for Q-RT-PCR from microarray data using a block bootstrap re-sampling scheme,” BMC Genomics, vol. 8, p. 140, 2007.
[15]  T.-P. Lu, M.-H. Tsai, J.-M. Lee et al., “Identification of a novel biomarker, SEMA5A, for non-small cell lung carcinoma in nonsmoking women,” Cancer Epidemiology Biomarkers and Prevention, vol. 19, no. 10, pp. 2590–2597, 2010.
[16]  J. Hou, J. Aerts, B. den Hamer et al., “Gene expression-based classification of non-small cell lung carcinomas and survival prediction,” PloS ONE, vol. 5, no. 4, Article ID e10312, 2010.
[17]  A. Sanchez-Palencia, M. Gomez-Morales, J. A. Gomez-Capilla et al., “Gene expression profiling reveals novel biomarkers in nonsmall cell lung cancer,” International Journal of Cancer, vol. 129, no. 2, pp. 355–364, 2011.
[18]  G. K. Smyth, “Linear models and empirical bayes methods for assessing differential expression in microarray experiments,” Statistical Applications in Genetics and Molecular Biology, vol. 3, no. 1, article 3, 2004.
[19]  Y. Saeys, I. Inza, and P. Larra?aga, “A review of feature selection techniques in bioinformatics,” Bioinformatics, vol. 23, no. 19, pp. 2507–2517, 2007.
[20]  C. Lazar, J. Taminau, S. Meganck et al., “A survey on filter techniques for feature selection in gene expression microarray analysis,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 4, pp. 1106–1119, 2012.
[21]  A. H. Sims, G. J. Smethurst, Y. Hey et al., “The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets-improving meta-analysis and prediction of prognosis,” BMC Medical Genomics, vol. 1, p. 42, 2008.
[22]  W. E. Johnson, C. Li, and A. Rabinovic, “Adjusting batch effects in microarray expression data using empirical Bayes methods,” Biostatistics, vol. 8, no. 1, pp. 118–127, 2007.
[23]  M. Benito, J. Parker, Q. Du et al., “Adjustment of systematic microarray data biases,” Bioinformatics, vol. 20, no. 1, pp. 105–114, 2004.
[24]  A. A. Shabalin, H. Tjelmeland, C. Fan, C. M. Perou, and A. B. Nobel, “Merging two gene-expression studies via cross-platform normalization,” Bioinformatics, vol. 24, no. 9, pp. 1154–1160, 2008.
[25]  J. Taminau, S. Meganck, C. Lazar et al., “Unlocking the potential of publicly available microarray data using inSilicoDb and inSilicoMerging R/Bioconductor packages,” BMC Bioinformatics, vol. 13, p. 335, 2011.
[26]  C. Lazar, S. Meganck, J. Taminau et al., “Batch effect removal methods for microarray gene expression data integration: a survey,” Briefings in Bioinformatics, vol. 14, no. 4, pp. 469–490, 2013.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133