全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Dynamic Clustering of Gene Expression

DOI: 10.5402/2012/537217

Full-Text   Cite this paper   Add to My Lib

Abstract:

It is well accepted that genes are simultaneously involved in multiple biological processes and that genes are coordinated over the duration of such events. Unfortunately, clustering methodologies that group genes for the purpose of novel gene discovery fail to acknowledge the dynamic nature of biological processes and provide static clusters, even when the expression of genes is assessed across time or developmental stages. By taking advantage of techniques and theories from time frequency analysis, periodic gene expression profiles are dynamically clustered based on the assumption that different spectral frequencies characterize different biological processes. A two-step cluster validation approach is proposed to statistically estimate both the optimal number of clusters and to distinguish significant clusters from noise. The resulting clusters reveal coordinated coexpressed genes. This novel dynamic clustering approach has broad applicability to a vast range of sequential data scenarios where the order of the series is of interest. 1. Introduction Microarray and next-generation sequencing (RNA-seq) technologies enable researchers to study any genomewide transcriptome at coordinated and varying stages. Since biological processes are time varying [1], they may be best described by time series gene expression rather than by a static gene expression analysis. Acknowledging the nature of genes that are involved in dynamic biological processes (e.g., developmental processes, mechanisms of cell cycle regulation, etc.) has potential to provide insight into the complex associations between genes that are involved. Functional discovery is a common goal of clustering gene expression data. In fact, the functionality of genes can be inferred if their expression patterns, or profiles, are similar to genes of known function. There are published clustering methods that include into the analysis the duration of the experimental stages, or the staged dependence structure of gene expression. The results from these approaches are certainly more informative and realistic than groupings that are gained from static clustering methods (i.e., clustering at a single-staged experimental point), but their results are limited in interpretation. The seminal work from Luan and Li [2] is a good example of a clustering application that takes the time dependent nature of genes into account. More realistic, though, is the fact that some biological processes typically start and end at identifiable stages, or time points, and that the genes in a process may be dynamically regulated at

References

[1]  H. Yu, N. M. Luscombe, J. Qian, and M. Gerstein, “Genomic analysis of gene expression relationships in transcriptional regulatory networks,” Trends in Genetics, vol. 19, no. 8, pp. 422–427, 2003.
[2]  Y. Luan and H. Li, “Clustering of time-course gene expression data using a mixed-effects model with B-splines,” Bioinformatics, vol. 19, no. 4, pp. 474–482, 2003.
[3]  J. Qian, M. Dolled-Filhart, J. Lin, H. Yu, and M. Gerstein, “Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions,” Journal of Molecular Biology, vol. 314, no. 5, pp. 1053–1066, 2001.
[4]  Y. Cheng and G. Church, “Biclustering of expression data,” in Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology, pp. 93–103, 2000.
[5]  L. Lazzeroni and A. Owen, “Plaid models for gene expression data,” Statistica Sinica, vol. 12, no. 1, pp. 61–86, 2002.
[6]  L. Ji and K. L. Tan, “Identifying time-lagged gene clusters using gene expression data,” Bioinformatics, vol. 21, no. 4, pp. 509–516, 2005.
[7]  J. Z. Song, K. M. Duan, T. Ware, and M. Surette, “The wavelet-based cluster analysis for temporal gene expression data,” EURASIP Journal on Bioinformatics and Systems Biology, vol. 2007, Article ID 39382, 2007.
[8]  S. Madeira and A. Oliveira, “An efficient biclustering algorithm for finding genes with similar patterns in time-series expression data,” in Proceedings of 5th Asia Pacific Bioinformatics, pp. 67–80, 2007.
[9]  Y. Zhang, H. Zha, and C. H. Chu, “A time-series biclustering algorithm for revealing co-regulated genes,” in Proceedings of the International Conference on Information Technology: coding and Computing (ITCC '05), pp. 32–37, April 2005.
[10]  G. Palla, A. L. Barabási, and T. Vicsek, “Quantifying social group evolution,” Nature, vol. 446, no. 7136, pp. 664–667, 2007.
[11]  E. R. Dougherty, I. Shmulevich, and M. L. Bittner, “Genomic signal processing: the salient issues,” Eurasip Journal on Applied Signal Processing, vol. 2004, no. 1, pp. 146–153, 2004.
[12]  R. Carmona, W. Hwang, and B. Torresani, Practical Time-frequency Analysis: Gabor and Wavelet Transforms with an Implementation in S Wavelet Analysis and Its Applications, Academic Press, 1998.
[13]  S. Qian, Introduction to Time-Frequency and Wavelet Transforms, Prentice-Hall, 2002.
[14]  P. Addison, The Illustrated Wavelet Transform Handbook, Taylor & Francis, 2002.
[15]  C. Furlanello, S. Merler, and G. Jurman, “Combining feature selection and DTW for time-varying functional genomics,” IEEE Transactions on Signal Processing, vol. 54, no. 6, pp. 2436–2443, 2006.
[16]  P. Goupillaud, A. Grossmann, and J. Morlet, “Cycle-octave and related transforms in seismic signal analysis,” Geoexploration, vol. 23, no. 1, pp. 85–102, 1984.
[17]  A. J. Butte, L. Bao, B. Y. Reis, T. W. Watkins, and I. S. Kohane, “Comparing the similarity of time-series gene expression using signal processing metrics,” Journal of Biomedical Informatics, vol. 34, no. 6, pp. 396–405, 2001.
[18]  U. Grenander, The NyguIst Frequency Is That Frequency Whose Period Is Two Sampling Intervals. Probability and StatIstics: The Harald Cramer Volume, Wiley, 1959.
[19]  G. J. Szekely and M. L. Rizzo, “Brownian distance variance,” The Annals of Applied Statistics, vol. 3, no. 4, pp. 1236–1265, 2009.
[20]  L. An and R. W. Doerge, “Dynamic clustering of cell-cycle gene expression data,” in Proceedings of the Kansas State University Conference on Applied Statistics in Agriculture, pp. 18–36, Manhattan, NY, USA, 2008.
[21]  J. Ward, “Hierarchical grouping to optimize an objective function,” Journal of American Statistical Association, vol. 58, pp. 236–244, 1963.
[22]  M. Halkidi, Y. Batistakis, and M. Vazirgiannis, “On clustering validation techniques,” Journal of Intelligent Information Systems, vol. 17, no. 2-3, pp. 107–145, 2001.
[23]  G. W. Milligan and M. C. Cooper, “An examination of procedures for determining the number of clusters in a data set,” Psychometrika, vol. 50, no. 2, pp. 159–179, 1985.
[24]  K. Y. Yeung, D. R. Haynor, and W. L. Ruzzo, “Validating clustering for gene expression data,” Bioinformatics, vol. 17, no. 4, pp. 309–318, 2001.
[25]  S. Salvador and P. Chan, “Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms,” in Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI '04), pp. 576–584, November 2004.
[26]  H. Chipman, T. Hastie, and R. Tibshirani, “Clustering microarray data,” in Statistical Analysis of Gene Expression Microarray Data, T. Speed, Ed., pp. 159–201, Chapman & Hall/CRC Press, 2003.
[27]  R. Tibshirani, G. Walther, and T. Hastie, “Estimating the number of clusters in a data set via the gap statistic,” Journal of the Royal Statistical Society. Series B, vol. 63, no. 2, pp. 411–423, 2001.
[28]  A. Brondsted, Introduction to Convex PolyTopes, Springer, New York, NY, USA, 1983.
[29]  B. Munneke, K. A. Schlauch, K. L. Simonsen, W. D. Beavis, and R. W. Doerge, “Adding confidence to gene expression clustering,” Genetics, vol. 170, no. 4, pp. 2003–2011, 2005.
[30]  G. C. Tseng and W. H. Wong, “Tight clustering: a resampling-based approach for identifying stable and tight patterns in data,” Biometrics, vol. 61, no. 1, pp. 10–16, 2005.
[31]  P. J. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,” Journal of Computational and Applied Mathematics, vol. 20, pp. 53–65, 1987.
[32]  L. Collins and C. Dent, “Omega: a general formulation of the rand index of cluster recovery suitable for non-disjoint solutions,” Multivariate Behavioral Research, vol. 23, pp. 231–242, 1988.
[33]  R. J. G. B. Campello, “A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment,” Pattern Recognition Letters, vol. 28, no. 7, pp. 833–841, 2007.
[34]  L. An, Dynamic clustering of time series gene expression [Ph.D. thesis], Purdue University, West Lafayette, Ind, USA, 2008.
[35]  Z. Bozdech, M. Llinás, B. L. Pulliam, E. D. Wong, J. Zhu, and J. L. DeRisi, “The transcriptome of the intraerythrocytic developmental cycle of Plasmodium falciparum,” PLoS Biology, vol. 1, no. 1, article e5, 2003.
[36]  R. S. Istepanian, A. Sungoor, and J. C. Nebel, “Comparative analysis of genomic signal processing for microarray data clustering,” IEEE Trans Nanobioscience, vol. 10, no. 4, pp. 225–238, 2011.
[37]  M. L. Whitfield, G. Sherlock, A. J. Saldanha et al., “Identification of genes periodically expressed in the human cell cycle and their expression in tumors,” Molecular Biology of the Cell, vol. 13, no. 6, pp. 1977–2000, 2002.
[38]  P. T. Spellman, G. Sherlock, M. Q. Zhang et al., “Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization,” Molecular Biology of the Cell, vol. 9, no. 12, pp. 3273–3297, 1998.
[39]  R. J. Cho, M. J. Campbell, E. A. Winzeler et al., “A genome-wide transcriptional analysis of the mitotic cell cycle,” Molecular Cell, vol. 2, no. 1, pp. 65–73, 1998.
[40]  M. J. Gardner, N. Hall, E. Fung et al., “Genome sequence of the human malaria parasite Plasmodium falciparum,” Nature, vol. 419, no. 6906, pp. 498–511, 2002.
[41]  L. Du, S. Wu, A. W. C. Liew, D. K. Smith, and H. Yan, “Spectral analysis of microarray gene expression time series data of Plasmodium falciparum,” International Journal of Bioinformatics Research and Applications, vol. 4, no. 3, pp. 337–349, 2008.
[42]  A. Kallio, N. Vuokko, M. Ojala, N. Haiminen, and H. Mannila, “Randomization techniques for assessing the significance of gene periodicity results,” BMC Bioinformatics, vol. 12, article 330, 2011.
[43]  R. Jurgelenaite, T. M. H. Dijkstra, C. H. M. Kocken, and T. Heskes, “Gene regulation in the intraerythrocytic cycle of Plasmodium falciparum,” Bioinformatics, vol. 25, no. 12, pp. 1484–1491, 2009.
[44]  M. Hirsch, S. Swift, and X. Liu, “Optimal search space for clustering gene expression data via consensus,” Journal of Computational Biology, vol. 14, no. 10, pp. 1327–1341, 2007.
[45]  O. Troyanskaya, M. Cantor, G. Sherlock et al., “Missing value estimation methods for DNA microarrays,” Bioinformatics, vol. 17, no. 6, pp. 520–525, 2001.
[46]  B. Efron and R. Tibshirani, Introduction to the Bootstrap, Chapman and Hall, New York, NY, USA, 1993.
[47]  D. W. Huang, B. T. Sherman, and R. A. Lempicki, “Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources,” Nature Protocols, vol. 4, no. 1, pp. 44–57, 2009.
[48]  D. W. Huang, B. T. Sherman, and R. A. Lempicki, “Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists,” Nucleic Acids Research, vol. 37, no. 1, pp. 1–13, 2009.
[49]  C. Aurrecoechea, J. Brestelli, B. P. Brunk et al., “PlasmoDB: a functional genomic database for malaria parasites,” Nucleic Acids Research, vol. 37, no. 1, pp. D539–D543, 2009.
[50]  http://david.abcc.ncifcrf.gov.
[51]  L. C. Wu, J. L. Huang, J. T. Horng, and H. D. Huang, “An expert system to identify co-regulated gene groups from time-lagged gene clusters using cell cycle expression data,” Expert Systems with Applications, vol. 37, no. 3, pp. 2202–2213, 2010.
[52]  J. Proakis and D. Manolakis, Digital Signal Processing: Principles, Algorithms, and Applications, Macmillan, 1992.
[53]  Y. Shiraishi, S. Kimura, and M. Okada, “Inferring cluster-based networks from differently stimulated multiple time-course gene expression data,” Bioinformatics, vol. 26, no. 8, Article ID btq094, pp. 1073–1081, 2010.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133