全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
PLOS ONE  2012 

Hierarchical Information Clustering by Means of Topologically Embedded Graphs

DOI: 10.1371/journal.pone.0031929

Full-Text   Cite this paper   Add to My Lib

Abstract:

We introduce a graph-theoretic approach to extract clusters and hierarchies in complex data-sets in an unsupervised and deterministic manner, without the use of any prior information. This is achieved by building topologically embedded networks containing the subset of most significant links and analyzing the network structure. For a planar embedding, this method provides both the intra-cluster hierarchy, which describes the way clusters are composed, and the inter-cluster hierarchy which describes how clusters gather together. We discuss performance, robustness and reliability of this method by first investigating several artificial data-sets, finding that it can outperform significantly other established approaches. Then we show that our method can successfully differentiate meaningful clusters and hierarchies in a variety of real data-sets. In particular, we find that the application to gene expression patterns of lymphoma samples uncovers biologically significant groups of genes which play key-roles in diagnosis, prognosis and treatment of some of the most relevant human lymphoid malignancies.

References

[1]  Jain A, Murty M, Flynn P (1999) Data clustering: A review. ACM Comuting Surveys 31:
[2]  McQueen J (1967) Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability 1: 281–297.
[3]  Xu R (2005) Survey of clustering algorithms. IEEE Transactions on Neural Networks 16: 645–678.
[4]  Eisen M, Spellman P, Brown P, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95(25): 14863–14868.
[5]  Rocke DM, Ideker T, Troyanskaya O, Quackenbush J, Dopazo J (2009) Papers on normalization, variable selection, classiTcation or clustering of microarray data, Editorial. Bioinformatics 25: 701–702.
[6]  Rivera C, Vakil R, Bader J (2010) NeMo: Network Module identification in Cytoscape. BMC Bioinformatics 11: No. Suppl 1.
[7]  Quackenbush J (2001) Computational analysis of microarray data. Nature Review 2: 418–427.
[8]  Jonsson PF, Cavanna T, Zicha D, Bates PA (2006) Cluster analysis of networks generated through homology: automatic identification of important protein communities involved in cancer metastasis. BMC Bioinformatics 7: 2.
[9]  Goh KII, Cusick ME, Valle D, Childs B, Vidal M, et al. (2007) The human disease network. Proc Natl Acad Sci USA 104: 8685–8690.
[10]  Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99: 7821–7826.
[11]  Kitsak M, Riccaboni M, Havlin S, Pammolli F, Stanley HE (2010) Scale-free models for the structure of business firm networks. Phys Rev E 81: 1–9.
[12]  Amaral L, Scala A, Barthelemy M, Stanley H (2000) Classes of small-world networks. Proc Natl Acad Sci 97: 11149–11152.
[13]  Garlaschelli D, Capocci A, Caldarelli G (2007) Self-organized network evolution coupled to extremal dynamics. Nature Physics 3: 813–817.
[14]  Caldarelli G (2007) Scale-Free Networks: Complex Webs in Nature and Technology. Oxford: Univesity Press.
[15]  Buldyrev SV, Parshani R, Paul G, Stanley HE, Havlin S (2010) Catastrophic cascade of failures in interdependent networks. Nature 464: 1025–1028.
[16]  Hooyberghs H, Van Schaeybroeck B, Moreira A, Andrade J, Herrmann H, et al. (2010) Biased percolation on scale-free networks. Phys Rev E 81: 011102.
[17]  Aste T, Di Matteo T, Hyde S (2005) Complex networks on hyperbolic surfaces. Physica A 346: 20–26.
[18]  Tumminello M, Aste T, Di Matteo T, Mantegna RN (2005) A tool for filtering information in complex systems. Proc Natl Acad Sci USA 102: 10421–10426.
[19]  Ringel G (1974) Map Color Theorem. Springer-Verlag, Berlin.
[20]  Andrade JSJ, Herrmann HJ, Andrade RF, da Silva LR (2005) Apollonian networks: Simultaneously scale-free, small world, euclidan, space filling and matching graphs. Phys Rev Lett 94: 1–4.
[21]  Di Matteo T, Aste T, Hyde S (2004) Exchanges in complex networks: Income and wealth distributions. In: Mallamace F, Stanley HE, editors. Physics of complex systems (new advances and perspectives). volume 155 of Proceedings of the international school of physics Enrico Fermi. pp. 435–442. International School of Physics Enrico Fermi on the Physics of Complex Systems - New Advances and Perspectives, Varenna, ITALY, JUL 01-11, 2003.
[22]  Di Matteo T, Aste T, Gallegati M (2005) Innovation flow through social networks: productivity distribution in France and Italy. Eur Phys J B 47: 459–466.
[23]  Pellegrini GL, de Arcangelis L, Hermann HJ, Perrone-Capano C (2007) Activity-dependent neural network model on scale-free netowkrs. Phys Rev E 76: 016107.
[24]  Di Matteo T, Pozzi F, Aste T (2010) The use of dynamical networks to detect the hierarchical organization of financial market sectors. Eur Phys J B 73: 3–11.
[25]  Diestel R (2005) Graph Theory ed. 3. Springer-Verlag.
[26]  Song WM, Di Matteo T, Aste T (2011) Nested hierarchies in planar graphs. Discrete Applied Mathematics 159: 2135–2146.
[27]  Sorensen T (1948) A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyses of the vegetation on danish commons. Biologiske Skrifter 5: 1–34.
[28]  Boyer JM, Myrvold WJ (2004) On the cutting edge: Simplified o(n) planarity by edge addition. Journal of Graph Algorithms and Applications 8: 2004.
[29]  Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms 1027–1035.
[30]  Shi J, Malik J (2000) Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8): 888–905.
[31]  von Luxburg U (2007) A tutorial on spectral clustering. Technical report, Max-Planck-Institut für biologische Kybernetik.
[32]  Kohonen T, Schroeder MR, Huang TS (2001) editors. Self-Organizing Maps. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 3rd edition.
[33]  Ruan J, Dean A, Zhang W (2010) A general co-expression network-based approach to gene expression analysis: comparison and applications. BMC Systems Biology 4(1): 8+.
[34]  Hubert L, Arabie P (1985) Comparing partitions. Journal of Classification 2: 193–218.
[35]  Hernádv?lgyi IT (1998) Generating random vectors from the multivariate normal distribution. Technical Report TR-98-07, University of Ottawa.
[36]  Wang ShaunS (2004) Casualty Actuarial Society Proc. Vol. LXXXV; and Available: http://www.mathworks.com/matlabcentral/f?ileexchange/6426. Accessed 2012 Jan 20.
[37]  Fortunato S, Barthélemy M (2007) Resolution limit in community detection. Proc Natl Acad Sci USA 104(1): 36–41.
[38]  Fisher RA (1936) The use of multiple measurements in taxonomic problems. Annals Eugen 7: 179–188.
[39]  UCI Machine Learning Repository. Iris data. Available: http://archive.ics.uci.edu/ml/datasets/I?ris. Accessed 2012 Jan 20.
[40]  de Souto MC, Costa IG, de Araujo DS, Ludermir TB, Schliep A (2008) Clustering cancer gene expression data: a comparative study. BMC Bioinformatics 9: 1–14.
[41]  Dunn J (1974) Well separated clusters and optimal fuzzy partitions. Journal of Cybernetics 4: 95–104.
[42]  Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Machine Intell 1: 2224–227.
[43]  Handl J, Knowles J, Kell DB (2005) Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15): 3201–3212.
[44]  Alizadeh A, Eisen M, Davis R, Ma C, Lossos I, et al. (2000) Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403: 503–511.
[45]  Wang J, Delabie J, Aasheim HC, Smeland E, Myklebost O (2002) Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study. BMC Bioinformatics 3(1): 36.
[46]  Abramson JS, Shipp MA (2005) Advanced in the biology and therapy of diffuse large b-cell lymphoma: moving toward a molecularly targeted approach. Blood 106: 1164–1174.
[47]  Lenz G, Wright GW, Emre NCT, Kohlhammer H, Dave SS, et al. (2008) Molecular subtypes of diffuse large b-cell lymphoma arise by distinct genetic pathways. Proc Natl Acad Sci USA 105(36): 13520–13525.
[48]  Wada N, Kohara M, Ogawa H, Sugiyama H, Fukuhara S, et al. (2009) Change of cd20 expression in diffuse large b-cell lymphoma treated with rituximab, and anti-cd20 monoclonal antibody: A study of the osaka lymphoma study group. Case Rep Oncol 2(3): 194–202.
[49]  Nathalie AJ, Boyle M, Bashashati A, Leach S, Brooks-Wilson A, et al. (2009) Diffuse large b-cell lymphoma: reduced cd20 expression is associated with an inferior survival. Blood 113:
[50]  Zhao X, Lapalombella R, Joshi T, Cheney C, Gowda A, et al. (2007) Targeting cd37-positive lymphoid malignancies with a novel engineered small modular immunopharmaceutical. Blood 110(7): 2569–2577.
[51]  Filipits M, Jaeger U, Pohl G, Stranzl T, Simonitsch I, et al. (2002) Cyclin d3 is a predictive and prognostic factor in diffuse large b-cell lymphoma. Clinical Cancer Research 8(3): 729–733.
[52]  Chen L, Monti S, Juszczynski P, Daley J, Chen W, et al. (2008) Syk-dependent tonic b-cell receptor signaling is a rational treatment target in diffuse large b-cell lymphoma. Blood 111(4): 2230–2237.
[53]  Lossos IS, Alizadeh AA, Diehn M, Warnke R, Thorstenson Y, et al. (2002) Transformation of follicular lymphoma to diffuse large-cell lymphoma: Alternative patterns with increased or decreased expression of c-myc and its regulated genes. Proc Natl Acad Sci USA 99(13): 8886–8891.
[54]  Coffey GP, Rajapaksa R, Liu R, Sharpe O, Kuo C-C, et al. (2009) Engagement of cd81 induces ezrin tyrosine phosphorylation and its cellular redistribution with filamentous actin. Journal of Cell Science 122(17): 3137–3144.
[55]  Lam LL, Wright G, Davis RE, Lenz G, Farinha P, et al. (2008) Cooperative signaling through the signal transducer and activator of transcription 3 and nuclear factor- pathways in subtypes of diffuse large b-cell lymphoma. Blood 111(7): 3701–3713.
[56]  BiNGO Available: http://www.psb.ugent.be/cbd/papers/BiNGO?/Home.html. Accessed 2012 Jan 20.
[57]  Zhao XF, Gartenhaus RB (2009) Phospho-p70s6k and cdc2/cdk1 as therapeutic targets for diffuse large b-cell lymphoma. Expert Opinion on Therapeutic Targets 13(9): 1085–1093.
[58]  Leseux L, Hamdi SM, al Saati T, Capilla F, Recher C, et al. (2006) Syk-dependent mtor activation in follicular lymphoma cells. Blood 108(13): 4156–4162.
[59]  Arsura M, Wu M, Sonenshein GE (1996) TGF-β1 inhibits NF-κb/rel activity inducing apoptosis of B cells: Transcriptional activation of iκbα. Immunity 5(1): 31–40.
[60]  Kamijo T, Zindy F, Roussel MF, Quelle Dawn E, Downing James R, et al. (1997) Tumor Suppression at the Mouse INK4a Locus Mediated by the Alternative Reading Frame Product p19 ARF. Cell 91(5): 649–659.
[61]  Seki R, Okamura T, Koga H, Yakushiji K, Hashiguchi M, et al. (2003) Prognostic significance of the f-box protein skp2 expression in diffuse large b-cell lymphoma. American Journal of Hematology 73(4): 230–235.
[62]  Saez AI, Saez AJ, Artiga MJ, Perez-Rosado A, Camacho F-I, et al. (2004) Building an outcome predictor model for diffuse large b-cell lymphoma. Am J Pathol 164(2): 613–622.
[63]  Ding BB, Yu JJ, Yu RY, Mendez LM, Shaknovich R, et al. (2008) Constitutively activated stat3 promotes cell proliferation and survival in the activated b-cell subtype of diffuse large b-cell lymphomas. Blood 111(3): 1515–1523.
[64]  Romeo G, Fiorucci G, Chiantore MV, Percario ZA, Vannucchi S, et al. (2002) Irf-1 as a negative regulator of cell proliferation. Journal of Interferon and Cytokine Research 22(1): 39–47.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133