%0 Journal Article %T Quantitative global studies of reactomes and metabolomes using a vectorial representation of reactions and chemical compounds %A Juan C Trivi£żo %A Florencio Pazos %J BMC Systems Biology %D 2010 %I BioMed Central %R 10.1186/1752-0509-4-46 %X In this work, we propose a vectorial representation of chemical reactions, which allows them to be compared and classified. In this representation, chemical compounds, reactions and pathways may be represented in the same vectorial space. We show that the representation of chemical compounds reflects their physicochemical properties and can be used for predictive purposes. We use the vectorial representations of reactions to perform a global classification of the reactome of the model organism E. coli.We show that this unsupervised clustering results in groups of enzymes more coherent in biological terms than equivalent groupings obtained from the EC hierarchy. This hierarchical clustering produces an optimal set of 21 groups which we analyzed for their biological meaning.The "genomic era" is being characterized by the massive determination of the molecular components of living systems. Genome sequencing projects are yielding complete genome sequences for hundreds of organisms [1]. Although not as massive as envisaged, structural genomics projects are speeding up the rate of protein structure determinations [2]. Many other initiatives also address large-scale repertories of molecular components and their relationships, seeking to decipher the "transcriptome" [3], the "interactome" [4,5], etc.These massive data contain much information about living organisms. Studied as a whole, from a systemic perspective, they provide global pictures of different aspects of biology, which can help to answer very basic questions about how life evolved and how organisms do what they do with their "molecular toolkits". For example, the repertory of protein sequences (in the order of 10E7 known so far) and their evolutionary relationships (represented by amino acid sequence similarities) can be represented in a "sequence space" [6]. Studying this global landscape of protein sequences as a whole has produced important information on the estimated total number of protein families ("Natur %U http://www.biomedcentral.com/1752-0509/4/46