%0 Journal Article %T Text Mining Perspectives in Microarray Data Mining %A Jeyakumar Natarajan %J ISRN Computational Biology %D 2013 %R 10.1155/2013/159135 %X Current microarray data mining methods such as clustering, classification, and association analysis heavily rely on statistical and machine learning algorithms for analysis of large sets of gene expression data. In recent years, there has been a growing interest in methods that attempt to discover patterns based on multiple but related data sources. Gene expression data and the corresponding literature data are one such example. This paper suggests a new approach to microarray data mining as a combination of text mining (TM) and information extraction (IE). TM is concerned with identifying patterns in natural language text and IE is concerned with locating specific entities, relations, and facts in text. The present paper surveys the state of the art of data mining methods for microarray data analysis. We show the limitations of current microarray data mining methods and outline how text mining could address these limitations. 1. Introduction DNA microarrays facilitate the simultaneous measurement of the expression levels of thousands of genes [1, 2]. As a result, this high-throughput technology has led to increased amount of gene expression data. Microarrays have been used for a variety of studies, including gene coregulation studies, gene function identification studies, identification of pathway and gene regulatory networks, predictive toxicology, clinical diagnosis, and sequence variance studies. For a complete description about microarrays and its analytical tasks, refer to the books [3¨C5]. Current microarray data mining methods such as clustering, classification, and association analysis are based on statistical and machine learning algorithms. Most of these techniques are purely data driven and do not incorporate significant amounts of biological knowledge. Considering the statistically ill-defined nature of microarray data (many more variables than observations) and the massive body of existing biological knowledge, it is imperative that we exploit that knowledge for analysis and interpretation of microarray data. Text mining techniques constitute a promising technology for automating the incorporation of scientific knowledge in the microarray data mining process. Applying domain knowledge is fundamental in any scientific discovery process. In biology, domain knowledge is available in vast collections of the literature in natural language form such as abstracts [6] and full-text journal articles [7, 8] and also as textual annotations in databases such as SwissProt [9] and GenBank [10] For example, the biological abstract database PubMed %U http://www.hindawi.com/journals/isrn.computational.biology/2013/159135/