“Big” molecules such as proteins and genes still continue to capture the imagination of most biologists, biochemists and bioinformaticians. “Small” molecules, on the other hand, are the molecules that most biologists, biochemists and bioinformaticians prefer to ignore. However, it is becoming increasingly apparent that small molecules such as amino acids, lipids and sugars play a far more important role in all aspects of disease etiology and disease treatment than we realized. This particular chapter focuses on an emerging field of bioinformatics called “chemical bioinformatics” – a discipline that has evolved to help address the blended chemical and molecular biological needs of toxicogenomics, pharmacogenomics, metabolomics and systems biology. In the following pages we will cover several topics related to chemical bioinformatics. First, a brief overview of some of the most important or useful chemical bioinformatic resources will be given. Second, a more detailed overview will be given on those particular resources that allow researchers to connect small molecules to diseases. This section will focus on describing a number of recently developed databases or knowledgebases that explicitly relate small molecules – either as the treatment, symptom or cause – to disease. Finally a short discussion will be provided on newly emerging software tools that exploit these databases as a means to discover new biomarkers or even new treatments for disease.
References
[1]
Trujillo E, Davis C, Milner J (2006) Nutrigenomics, proteomics, metabolomics, and the practice of dietetics. J Am Diet Assoc 106: 403–413. doi: 10.1016/j.jada.2005.12.002
[2]
Feng X, Liu X, Luo Q, Liu BF (2008) Mass spectrometry in systems biology: an overview. Mass Spectrom Rev 27: 635–660. doi: 10.1002/mas.20182
[3]
Brown FK (1998) Chemoinformatics: what is it and how does it impact drug discovery. Annu Rep Med Chem 33: 375–384. doi: 10.1016/s0065-7743(08)61100-8
[4]
Altschul SF, Madden TL, Sch?ffer AA, Zhang J, Zhang Z, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402. doi: 10.1093/nar/25.17.3389
[5]
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2010) GenBank. Nucleic Acids Res 38: D46–51. doi: 10.1093/nar/gkp1024
[6]
McGuffin LJ, Bryson K, Jones DT (2000) The PSIPRED protein structure prediction server. Bioinformatics 16: 404–405. doi: 10.1093/bioinformatics/16.4.404
[7]
Westbrook J, Feng Z, Jain S, Bhat TN, Thanki N, et al. (2002) The Protein Data Bank: unifying the archive. Nucleic Acids Res 30: 245–248. doi: 10.1093/nar/30.1.245
[8]
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, et al. (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102: 15545–15550. doi: 10.1073/pnas.0506580102
[9]
Edgar R, Domrachev M, Lash AE (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30: 207–210. doi: 10.1093/nar/30.1.207
[10]
Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, et al. (2006) From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 34: D354–357. doi: 10.1093/nar/gkj102
[11]
Karp PD, Riley M, Paley SM, Pelligrini-Toole A (1996) EcoCyc: an encyclopedia of Escherichia coli genes and metabolism. Nucleic Acids Res 24: 32–39. doi: 10.1093/nar/24.1.32
[12]
Krummenacker M, Paley S, Mueller L, Yan T, Karp PD (2005) Querying and computing with BioCyc databases. Bioinformatics 21: 3454–3455. doi: 10.1093/bioinformatics/bti546
[13]
Joshi-Tope G, Gillespie M, Vastrik I, D'Eustachio P, Schmidt E, et al. (2005) Reactome: a knowledgebase of biological pathways. Nucleic Acids Res 33: D428–432. doi: 10.1093/nar/gki072
[14]
Pico AR, Kelder T, van Iersel MP, Hanspers K, Conklin BR, et al. (2008) WikiPathways: pathway editing for the people. PLoS Biol 6: e184 doi:10.1371/journal.pbio.0060184.
[15]
Frolkis A, Knox C, Lim E, Jewison T, Law V, et al. (2010) SMPDB: The Small Molecule Pathway Database. Nucleic Acids Res 38: D480–487. doi: 10.1093/nar/gkp1002
[16]
Fahy E, Sud M, Cotter D, Subramaniam S (2007) LIPID MAPS online tools for lipid research. Nucleic Acids Res 35: W606–612. doi: 10.1093/nar/gkm324
[17]
Degtyarenko K, de Matos P, Ennis M, Hastings J, Zbinden M, et al. (2008) ChEBI: a database and ontology for chemical entities of biological interest. Nucleic Acids Res 36: D344–350. doi: 10.1093/nar/gkm791
[18]
Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, et al. (2009) PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res 37: W623–633. doi: 10.1093/nar/gkp456
[19]
Williams AJ (2008) Public chemical compound databases. Curr Opin Drug Discov Devel 11: 393–404.
[20]
Shinbo Y, Nakamura Y, Altaf-Ul-Amin M, Asahi K, Kuokawa M, et al. (2006) KNApSAcK: a comprehensive species-metabolite relationship database. Biotech Agri Forestry 57: 165–181. doi: 10.1007/3-540-29782-0_13
[21]
Smith CA, O'Maille G, Want EJ, Qin C, Traguer SA, et al. (2005) METLIN: a metabolite mass spectral database. Ther Drug Monit 27: 747–751. doi: 10.1097/01.ftd.0000179845.53213.39
[22]
Ulrich EL, Akutsu H, Doreleijers JF, Harano Y, Ioannidis YE, et al. (2008) BioMagResBank. Nucleic Acids Res 36: D402–408. doi: 10.1093/nar/gkm957
[23]
Taguchi R, Nishijima M, Shimizu T (2007) Basic analytical systems for lipidomics by mass spectrometry in Japan. Methods Enzymol 432: 185–211. doi: 10.1016/s0076-6879(07)32008-9
[24]
Cui Q, Lewis IA, Hegeman AD, Anderson ME, Li J, et al. (2008) Metabolite identification via the Madison Metabolomics Consortium Database. Nat Biotechnol 26: 162–164. doi: 10.1038/nbt0208-162
[25]
Kopka J, Schauer N, Krueger S, Birkemeyer C, Usadel B, et al. (2005) GMD@CSB.DB: the Golm Metabolome Database. Bioinformatics 21: 1635–1638. doi: 10.1093/bioinformatics/bti236
[26]
Wishart DS, Knox C, Guo AC, Eisner R, Young N, et al. (2009) HMDB: a knowledgebase for the human metabolome. Nucleic Acids Res 37: D603–610. doi: 10.1093/nar/gkn810
[27]
Polen HH, Zapantis A, Clauson KA, Jebrock J, Paris M (2008) Ability of online drug databases to assist in clinical decision-making with infectious disease therapies. BMC Infect Dis 8: 153. doi: 10.1186/1471-2334-8-153
[28]
Hatfield CL, May SK, Markoff JS (1999) Quality of consumer drug information provided by four Web sites. Am J Health Syst Pharm 56: 2308–2311.
[29]
Zhu F, Han B, Kumar P, Liu X, Ma X, et al. (2010) Update of TTD: Therapeutic Target Database. Nucleic Acids Res 38: D787–791. doi: 10.1093/nar/gkp1014
[30]
Sangkuhl K, Berlin DS, Altman RB Klein TE (2008) PharmGKB: understanding the effects of individual genetic variants. Drug Metab Rev 40: 539–551. doi: 10.1080/03602530802413338
[31]
Günther S, Kuhn M, Dunkel M, Campillos M, Senger C, et al. (2008) SuperTarget and Matador: resources for exploring drug-target relationships. Nucleic Acids Res 36: D919–922. doi: 10.1093/nar/gkm862
[32]
Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, et al. (2006) DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res 34: D668–672. doi: 10.1093/nar/gkj067
[33]
Schmidt U, Struck S, Gruening B, Hossbach J, Jaeger IS, et al. (2009) SuperToxic: a comprehensive database of toxic compounds. Nucleic Acids Res 37: D295–299. doi: 10.1093/nar/gkn850
[34]
Judson R, Richard A, Dix D, Houck K, Elloumi F, et al. (2008) ACToR–Aggregated Computational Toxicology Resource. Toxicol Appl Pharmacol 233: 7–13. doi: 10.1016/j.taap.2007.12.037
[35]
Davis AP, Murphy CG, Saraceni-Richards CA, Rosenstein MC, Wiegers TC, et al. (2009) Comparative Toxicogenomics Database: a knowledgebase and discovery tool for chemical-gene-disease networks. Nucleic Acids Res 37: D786–792. doi: 10.1093/nar/gkn580
[36]
Lim E, Pon A, Djoumbou Y, Knox C, Shrivastava S, et al. (2010\) T3DB: a comprehensively annotated database of common toxins and their targets. Nucleic Acids Res 38: D781–786.
[37]
Boutet E, Lieberherr D, Tognolli M, Schneider M, Bairoch A (2007) UniProtKB/Swiss-Prot. Methods Mol Biol 406: 89–112. doi: 10.1007/978-1-59745-535-0_4
[38]
Wishart DS (2007) Proteomics and the human metabolome project. Expert Rev Proteomics 4: 333–335. doi: 10.1586/14789450.4.3.333
[39]
Hamosh A, Scott AF, Amberger J, Valle D, McKusick VA (2000) Online Mendelian Inheritance in Man (OMIM). Hum Mutat 15: 57–61. doi: 10.1002/(sici)1098-1004(200001)15:1<57::aid-humu12>3.0.co;2-g
[40]
Weininger D (1988) SMILES 1. Introduction and encoding rules. J Chem Inf Comput Sci 28: 31–38. doi: 10.1021/ci00057a005
[41]
Shoshan MC, Linder S (2008) Target specificity and off-target effects as determinants of cancer drug efficacy. Expert Opin Drug Metab Toxicol 4: 273–280. doi: 10.1517/17425255.4.3.273
[42]
Thorisson GA, Stein LD (2003) The SNP Consortium website: past, present and future. Nucleic Acids Res 31: 124–127. doi: 10.1093/nar/gkg052
[43]
International HapMap Consortium (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449: 851–861.
[44]
Cheng D, Knox C, Young N, Stothard P, Damaraju S, et al. (2008) PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites. Nucleic Acids Res 36: W399–405. doi: 10.1093/nar/gkn296
[45]
Xia J, Wishart DS (2010) MSEA: a web-based tool to identify biologically meaningful patterns in quantitative metabolomic data. Nucleic Acids Res 38: W71–7. doi: 10.1093/nar/gkq329