%0 Journal Article %T A Robust Topology-Based Algorithm for Gene Expression Profiling %A Lars Seemann %A Jason Shulman %A Gemunu H. Gunaratne %J ISRN Bioinformatics %D 2012 %R 10.5402/2012/381023 %X Early and accurate diagnoses of cancer can significantly improve the design of personalized therapy and enhance the success of therapeutic interventions. Histopathological approaches, which rely on microscopic examinations of malignant tissue, are not conducive to timely diagnoses. High throughput genomics offers a possible new classification of cancer subtypes. Unfortunately, most clustering algorithms have not been proven sufficiently robust. We propose a novel approach that relies on the use of statistical invariants and persistent homology, one of the most exciting recent developments in topology. It identifies a sufficient but compact set of genes for the analysis as well as a core group of tightly correlated patient samples for each subtype. Partitioning occurs hierarchically and allows for the identification of genetically similar subtypes. We analyzed the gene expression profiles of 202 tumors of the brain cancer glioblastoma multiforme (GBM) given at the Cancer Genome Atlas (TCGA) site. We identify core patient groups associated with the classical, mesenchymal, and proneural subtypes of GBM. In our analysis, the neural subtype consists of several small groups rather than a single component. A subtype prediction model is introduced which partitions tumors in a manner consistent with clustering algorithms but requires the genetic signature of only 59 genes. 1. Introduction Cancers in many tissues are heterogeneous, and the efficacy of therapeutic interventions depends on the specific subtype of the malignancy. Hence, early and accurate identification of the cancer subtype is critical in designing an effective personalized therapy. Current methods for assessment rely on microscopic examinations of the malignant tissue for previously established histopathological abnormalities. Unfortunately, such features may not be apparent during early stages of the disease and moreover, differentiating between abnormalities in distinct cancer subtypes can be challenging. Recent advances in high-throughput genomics offer an exciting new alternative for early and reliable cancer prognosis. Mutations that underlie a malignancy modify the levels of many genes within a cell; the goal of gene expression profiling is to define a signature for each cancer subtype through statistically significant up-/downregulation of a panel of genes. The National Institutes for Health, through the Cancer Genome Atlas (TCGA) [1, 2], will aid this effort by establishing large sets of genomic data on human cancers in at least 20 tissues [3¨C8]. The premise behind TCGA is that %U http://www.hindawi.com/journals/isrn.bioinformatics/2012/381023/