This article aims to introduce the nature of data integration to life scientists. Generally, the subject of data integration is not discussed outside the field of computational science and is not covered in any detail, or even neglected, when teaching/training trainees. End users (hereby defined as wet-lab trainees, clinicians, lab researchers) will mostly interact with bioinformatics resources and tools through web interfaces that mask the user from the data integration processes. However, the lack of formal training or acquaintance with even simple database concepts and terminology often results in a real obstacle to the full comprehension of the resources and tools the end users wish to access. Understanding how data integration works is fundamental to empowering trainees to see the limitations as well as the possibilities when exploring, retrieving, and analysing biological data from databases. Here we introduce a game-based learning activity for training/teaching the topic of data integration that trainers/educators can adopt and adapt for their classroom. In particular we provide an example using DAS (Distributed Annotation Systems) as a method for data integration.
References
[1]
Emmert-Streib F, Glazko GV (2011) Pathway analysis of expression data: deciphering functional building blocks of complex diseases. PLoS Comput Biol 7: e1002053 doi:10.1371/journal.pcbi.1002053.
[2]
Roy S, Ernst J, Kharchenko PV, Kheradpour P, Negre N, et al. (2010) Identification of functional elements and regulatory circuits by drosophila modENCODE. Science 330: 1787–1797.
[3]
Bellay J, Han S, Michaut M, Kim T, Costanzo M, et al. (2011) Bringing order to protein disorder through comparative genomics and genetic interactions. Genome Biol 12: R14. doi: 10.1186/gb-2011-12-2-r14
[4]
Korcsmáros T, Szalay MS, Rovó P, Palotai R, Fazekas D, et al. (2011) Signalogs: orthology-based identification of novel signaling pathway components in three metazoans. PLoS ONE 8: e19240 doi:10.1371/journal.pone.0019240.
[5]
Hwang D, Rust AG, Ramsey S, Smith JJ, Leslie DM, et al. (2005) A data integration methodology for systems biology. Proc Natl Acad Sci USA 102: 17296–17301. doi: 10.1073/pnas.0508647102
[6]
Chung SY, Wong L (1999) Kleisli: a new tool for data integration in biology. Trends Biotechnol 17: 351–355. doi: 10.1016/s0167-7799(99)01342-6
[7]
Galperin MY, Cochrane GR (2011) The 2011 nucleic acids research database issue and the online molecular biology database collection. Nucleic Acids Res 39: 1. doi: 10.1093/nar/gkq1243
[8]
Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, et al. (2003) A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302: 449–453. doi: 10.1126/science.1087361
[9]
Cheung KH, Yip KY, Smith A, Deknikker R, Masiar A, et al. (2005) YeastHub: a semantic web use case for integrating data in the life sciences domain. Bioinformatics 21: i85–i96. doi: 10.1093/bioinformatics/bti1026
[10]
Myers CL, Troyanskaya OG (2007) Context-sensitive data integration and prediction of biological networks. Bioinformatics 23: 2322–2330. doi: 10.1093/bioinformatics/btm332
[11]
von Mering C, Jensen LJ, Kuhn M, Chaffron S, Doerks T, et al. (2007) STRING 7—recent developments in the integration and prediction of protein interactions. Nucleic Acids Res 35: D358–D362. doi: 10.1093/nar/gkl825
[12]
Hwang D, Rust AG, Ramsey S, Smith JJ, Leslie DM, et al. (2005) A data integration methodology for systems biology. Proc Natl Acad Sci USA 102: 17296–17301. doi: 10.1073/pnas.0508647102
[13]
Letunic I, Copley RR, Schmidt S, Ciccarelli FD, Doerks T, et al. (2004) SMART 4.0: towards genomic data integration. Nucleic Acids Res 32: D142–D144. doi: 10.1093/nar/gkh088
[14]
Karp PD (1996) Database links are a foundation for interoperability. Trends Biotechnol 14 (8) 273–279. doi: 10.1016/0167-7799(96)10044-5
[15]
Benton D (1996) Bioinformatics—principles and potential of a new multidisciplinary tool. Trends Biotechnol 14: 261–272. doi: 10.1016/0167-7799(96)10037-8
[16]
Li W, Clifton C, Liu S (2000) Database Integration using neural network: implementation and experience. Knowledge and Information Systems 2: 73–96. doi: 10.1007/s101150050004
[17]
Orr MS, Goodsaid F, Amur S, Rudman A, Frueh FW (2007) The experience with voluntary genomic data submissions at the FDA and a vision for the future of the voluntary data submission program. Clin Pharmacol Ther 81: 294–297.
[18]
Li X, Zhang Y (2003) Bioinformatics data distribution and integration via Web Services and XML. Genomics Proteomics Bioinformatics 1: 299–303.
[19]
Dowell RD, Jokerst RM, Day A, Eddy SR, Stein L (2001) The distributed annotation system. BMC Bioinformatics 2: 7. doi: 10.1186/1471-2105-2-7
[20]
Jenkinson AM, Albrecht M, Birney E, Blankenburg H, Down T, et al. (2008) Integrating biological data—the Distributed Annotation System. BMC Bioinformatics 9: S3. doi: 10.1186/1471-2105-9-s8-s3
[21]
Hsieh SH, Hsieh SL, Cheng PH, Lai F (2012) E-health and healthcare enterprise information system leveraging service-oriented architecture. Telemed J E Health 18: 205–212. doi: 10.1089/tmj.2011.0100
[22]
Romano P (2008) Automation of in-silico data analysis processes through workflow management systems. Brief Bioinform 9: 57–68. doi: 10.1093/bib/bbm056
[23]
Orchard S, Hermjakob H, Apweiler R (2003) The proteomics standards initiative. Proteomics 3: 1374–1376. doi: 10.1002/pmic.200300496
[24]
Griffin JL, Steinbeck C (2010) So what have data standards ever done for us? The view from metabolomics. Genome Med 2: 38. doi: 10.1186/gm159
[25]
Goble C, Stevens R (2008) State of the nation in data integration for bioinformatics. J Biomed Inform 41: 687–693. doi: 10.1016/j.jbi.2008.01.008
[26]
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25: 25–29.
[27]
Smedley D, Haider S, Ballester B, Holland R, London D, et al. (2009) BioMart—biological queries made easy. BMC Genomics 10: 22. doi: 10.1186/1471-2164-10-22
[28]
Aranda B, Blankenburg H, Kerrien S, Brinkman FS, Ceol A, et al. (2011) PSICQUIC and PSISCORE: accessing and scoring molecular interactions. Nat Methods 8: 528–529. doi: 10.1038/nmeth.1637
[29]
Michael D, Chen S (2006) Serious games: games that educate, train and inform. Boston: Thomson Course Technology.
[30]
Randel JM, Morris BA, Wetzel CD, Whitehill BV (1992) The effectiveness of games for educational purposes: a review of recent research. Simulation & Gaming 23: 261–276. doi: 10.1177/1046878192233001
[31]
Roberts M, Zydney JM (2004) Trainees as environmental consultants simulating life science problems. Learning and Leading With Technology 32: 22–25.
[32]
Squire K (2005) Game-based learning: present and future state of the field. Madison, WI: University of Wisconsin–Madison Press.
[33]
Down TA, Piipari M, Hubbard TJ (2011) Dalliance: interactive genome viewing on the web. Bioinformatics 27: 889–890. doi: 10.1093/bioinformatics/btr020
[34]
Etzold T, Ulyanov A, Argos P (1996) SRS: information retrieval system for molecular biology data banks. Methods Enzymol 266: 114–128. doi: 10.1016/s0076-6879(96)66010-8
[35]
K?hler J, Baumbach J, Taubert J, Specht M, Skusa A, et al. (2006) Graph-based analysis and visualization of experimental results with ONDEX. Bioinformatics 22: 1383–1390. doi: 10.1093/bioinformatics/btl081