Metabolite identification is a major bottleneck in metabolomics due to the number and diversity of the molecules. To alleviate this bottleneck, computational methods and tools that reliably filter the set of candidates are needed for further analysis by human experts. Recent efforts in assembling large public mass spectral databases such as MassBank have opened the door for developing a new genre of metabolite identification methods that rely on machine learning as the primary vehicle for identification. In this paper we describe the machine learning approach used in FingerID, its application to the CASMI challenges and some results that were not part of our challenge submission. In short, FingerID learns to predict molecular fingerprints from a large collection of MS/MS spectra, and uses the predicted fingerprints to retrieve and rank candidate molecules from a given large molecular database. Furthermore, we introduce a web server for FingerID, which was applied for the first time to the CASMI challenges. The challenge results show that the new machine learning framework produces competitive results on those challenge molecules that were found within the relatively restricted KEGG compound database. Additional experiments on the PubChem database confirm the feasibility of the approach even on a much larger database, although room for improvement still remains.
References
[1]
Kell, D. Metabolomics and systems biology: Making sense of the soup. Curr. Opin. Microbiol. 2004, 7, 296–307, doi:10.1016/j.mib.2004.04.012.
[2]
Pitk?nen, E.; Rousu, J.; Ukkonen, E. Computational methods for metabolic reconstruction. Curr. Opin. Biotechnol. 2010, 21, 70–77, doi:10.1016/j.copbio.2010.01.010.
[3]
Neumann, S.; B?cker, S. Computational mass spectrometry for metabolomics: Identification of metabolites and small molecules. Anal. Bioanal. Chem. 2010, 398, 2779–2788, doi:10.1007/s00216-010-4142-5.
[4]
Wishart, D. Computational strategies for metabolite identification in metabolomics. Bioanalysis 2009, 1, 1579–1596, doi:10.4155/bio.09.138.
[5]
Horai, H.; Arita, M.; Kanaya, S.; Nihei, Y.; Ikeda, T.; Suwa, K.; Ojima, Y.; Tanaka, K.; Tanaka, S.; Aoshima, K.; et al. MassBank: A public repository for sharing mass spectral data for life sciences. J. Mass Spectrom. 2010, 45, 703–714, doi:10.1002/jms.1777.
[6]
AtMetExpress LCMS. Available online: http://prime.psc.riken.jp/lcms/AtMetExpress/ (accessed on 3 June 2013).
[7]
PlantMetabolomics. Available online: http://www.plantmetabolomics.org/ (accessed on 3 June 2013).
[8]
Heinonen, M.; Rantanen, A.; Mielik?inen, T.; Pitk?nen, E.; Kokkonen, J.; Rousu, J. Ab Initio Prediction of Molecular Fragments from Tandem Mass Spectrometry Data. In Proceedings of the German Conference on Bioinformatics, Tübingen, Germany, September 2006; Gesellschaft für Informatik: Bonn, Germany, 2006; Volume P-83, pp. 40–53.
[9]
Heinonen, M.; Rantanen, A.; Mielik?inen, T.; Kokkonen, J.; Kiuru, J.; Ketola, R.; Rousu, J. FiD: A software for ab initio structural identification of product ions from tandem mass spectrometric data. Rapid Commun. Mass Spectrom. 2008, 22, 3043–3052, doi:10.1002/rcm.3701.
[10]
Wolf, S.; Schmidt, S.; Müller-Hannemann, M.; Neumann, S. In silico fragmentation for computer assisted identification of metabolite mass spectra. BMC Bioinforma. 2010, 11, 148, doi:10.1186/1471-2105-11-148.
[11]
B?cker, S.; Letzel, M.; Liptak, Z.; Pervukhin, A. SIRIUS: Decomposing isotope patterns for metabolite identification. Bioinformatics 2009, 25, 218–224, doi:10.1093/bioinformatics/btn603.
[12]
Heinonen, M.; Shen, H.; Zamboni, N.; Rousu, J. Metabolite identification and molecular fingerprint prediction through machine learning. Bioinformatics 2012, 28, 2333–2341, doi:10.1093/bioinformatics/bts437.
[13]
FingerID. Available online: http://sourceforge.net/p/fingerid/ (accessed on 3 June 2013).
[14]
Kanehisa, M.; Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucl. Acids Res. 2000, 28, 27–30, doi:10.1093/nar/28.1.27.
[15]
Bolton, E.E.; Wang, Y.; Thiessen, P.A.; Bryant, S.H. PubChem: Integrated platform of small molecules and biological activities. Ann. Rep. Comput. Chem. 2008, 4, 217–241, doi:10.1016/S1574-1400(08)00012-1.
[16]
Critical Assessment of Small Molecule Identification. Available online: http://www.casmi-contest.org/ (accessed on 3 June 2013).
Jebara, T.; Kondor, R.; Howard, A. Probability product kernels. J. Mach. Learn. Res. 2004, 5, 819–844.
[19]
FingerID web server. Available online: http://research.ics.aalto.fi/kepaco/fingerid/ (accessed on 3 June 2013).
[20]
Isaac, G.; Jeannotte, R.; Esch, S.; Welti, R. New Mass-Spectrometry-Based Strategies for Lipids. In Genetic Engineering; Setlow, J., Ed.; Genetic Engineering; Springer: New York, NY, USA, 2007; Volume 28, pp. 129–157.
[21]
O’Boyle, N.; Banck, M.; James, C.; Morley, C.; Vandermeersch, T.; Hutchison, G. Open babel: An open chemical toolbox. J. Cheminf. 2011, 3, 1–14, doi:10.1186/1758-2946-3-1.
[22]
B?cker, S.; Letzel, M.C.; Lipták, Z.; Pervukhin, A. SIRIUS: Decomposing isotope patterns for metabolite identification. Bioinformatics 2009, 25, 218–224, doi:10.1093/bioinformatics/btn603.
[23]
Rousu, J.; Rantanen, A.; Ketola, R.; Kokkonen, J. Isotopomer distribution computation from tandem mass spectrometric data with overlapping fragment spectra. Spectroscopy 2005, 19, 53–67, doi:10.1155/2005/575686.
[24]
Rantanen, A.; Rousu, J.; Ketola, R.; Kokkonen, J.; Tarkiainen, V. Computing positional isotopomer distributions from tandem mass spectrometric data. Metab. Eng. 2002, 4, 285–294, doi:10.1006/mben.2002.0232.
[25]
Yergey, J.A. A general approach to calculating isotopic distributions for mass spectrometry. Int. J. Mass Spectrom. Ion Phys. 1983, 52, 337–349, doi:10.1016/0020-7381(83)85053-0.
[26]
Kubinyi, H. Calculation of isotope distributions in mass spectrometry. A trivial solution for a non-trivial problem. Anal. Chim. Acta 1991, 247, 107–119, doi:10.1016/S0003-2670(00)83059-7.
[27]
Patiny, L.; Borel, A. ChemCalc: A building block for tomorrow’s chemical infrastructure. J. Chem. Inf. Model. 2013, 53, 1223–1228, doi:10.1021/ci300563h.