All Title Author
Keywords Abstract

Publish in OALib Journal
ISSN: 2333-9721
APC: Only $99

ViewsDownloads

Relative Articles

More...

BioBroker: Knowledge Discovery Framework for Heterogeneous Biomedical Ontologies and Data

DOI: 10.4236/jilsa.2018.101001, PP. 1-20

Keywords: Knowledge Discovery, Ontology, Linked Data

Full-Text   Cite this paper   Add to My Lib

Abstract:

A large number of ontologies have been introduced by the biomedical community in recent years. Knowledge discovery for entity identification from ontology has become an important research area, and it is always interesting to discovery how associations are established to connect concepts in a single ontology or across multiple ontologies. However, due to the exponential growth of biomedical big data and their complicated associations, it becomes very challenging to detect key associations among entities in an inefficient dynamic manner. Therefore, there exists a gap between the increasing needs for association detection and large volume of biomedical ontologies. In this paper, to bridge this gap, we presented a knowledge discovery framework, the BioBroker, for grouping entities to facilitate the process of biomedical knowledge discovery in an intelligent way. Specifically, we developed an innovative knowledge discovery algorithm that combines a graph clustering method and an indexing technique to discovery knowledge patterns over a set of interlinked data sources in an efficient way. We have demonstrated capabilities of the BioBroker for query execution with a use case study on a subset of the Bio2RDF life science linked data.

References

[1]  Nekrutenko, A., et al. (2012) Next-Generation Sequencing Data Interpretation: Enhancing Reproducibility and Accessibility. Nature Reviews Genetics, 13, 667.
https://doi.org/10.1038/nrg3305
[2]  Bizer, C., et al. (2009) Linked Data—The Story So Far. International Journal on Semantic Web and Information Systems, 5, 1-22.
https://doi.org/10.4018/jswis.2009081901
[3]  Semantic Web Health Care and Life Sciences Interest Group (2018)
http://www.w3.org/2001/sw/hcls/
[4]  Lassila, O., et al. (1999) Resource Description Framework (RDF) Model and Syntax Specification. W3C (MIT, INRIA, Keio), 1-39.
[5]  Bechhofer, S. (2009) OWL: Web Ontology Language. Encyclopedia of Database Systems: Springer, Berlin, 2008-2009.
[6]  Luciano, J.S., et al. (2011) The Translational Medicine Ontology and Knowledge Base: Driving Personalized Medicine by Bridging the Gap between Bench and Bedside. Journal of Biomedical Semantics, 2, S1.
https://doi.org/10.1186/2041-1480-2-S2-S1
[7]  Shen, F., et al. (2016) Predicate Oriented Pattern Analysis for Biomedical Knowledge Discovery. Intelligent Information Management, 8, 66.
https://doi.org/10.4236/iim.2016.83006
[8]  Shen, F., et al. (2017) Populating Physician Biographical Pages Based on EMR Data. AMIA Summits on Translational Science Proceedings, 2017, 522.
[9]  Shen, F. (2015) A Pervasive Framework for Real-Time Activity Patterns of Mobile Users. Pervasive Computing and Communication Workshops (PerCom Workshops), 2015 IEEE International Conference on, St. Louis, 23-27 March 2015, 248-250.
[10]  Sheth, A.P. (1999) Changing Focus on Interoperability in Information Systems: From System, Syntax, Structure to Semantics. Interoperating Geographic Information Systems: Springer, Berlin, 5-29.
https://doi.org/10.1007/978-1-4615-5189-8_2
[11]  Shvaiko, P., et al. (2008) Ten Challenges for Ontology Matching. OTM Confederated International Conferences “On the Move to Meaningful Internet Systems”, Monterrey, 9-14 November 2008, 1164-1182.
https://doi.org/10.1007/978-3-540-88873-4_18
[12]  Shvaiko, P., et al. (2013) Ontology Matching: State of the Art and Future Challenges. IEEE Transactions on Knowledge and Data Engineering, 25, 158-176.
https://doi.org/10.1109/TKDE.2011.253
[13]  Wu, X., et al. (2014) Data Mining with Big Data. IEEE Transactions on Knowledge and Data Engineering, 26, 97-107.
https://doi.org/10.1109/TKDE.2013.109
[14]  Kambatla, K., et al. (2014) Trends in Big Data Analytics. Journal of Parallel and Distributed Computing, 74, 2561-2573.
https://doi.org/10.1016/j.jpdc.2014.01.003
[15]  Reed, D.A., et al. (2015) Exascale Computing and Big Data. Communications of the ACM, 58, 56-68.
https://doi.org/10.1145/2699414
[16]  Shen, F., et al. (2016) Knowledge Discovery from Biomedical Ontologies in Cross Domains. PLoS ONE, 11, e0160005.
https://doi.org/10.1371/journal.pone.0160005
[17]  Shen, F. (2016) A Graph Analytics Framework For Knowledge Discovery. PhD Dissertation, University of Missouri, Kansas City.
https://mospace.umsystem.edu/xmlui/handle/10355/49408
[18]  Shen, F., et al. (2018) MedTQ: Dynamic Topic Discovery and Query Generation for Medical Ontologies. arXiv Preprint, arXiv:180203855.
[19]  Shaw, M., et al. (2008) Generating Application Ontologies from Reference Ontologies. AMIA Annual Symposium Proceedings, Washington DC, 8-12 November 2008, 672-676.
[20]  Dasgupta, S., et al. (2014) SMARTSPACE: Multiagent Based Distributed Platform for Semantic Service Discovery. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 44, 805-821.
https://doi.org/10.1109/TSMC.2013.2281582
[21]  Shen, F., et al. (2017) Leveraging Collaborative Filtering to Accelerate Rare Disease Diagnosis. American Medical Informatics Association, Washington D.C.
[22]  Shen, F., et al. (2017) Accelerating Rare Disease Diagnosis with Collaborative Filtering. American Medical Informatics Association, Washington D.C.
[23]  Vaka, P., et al. (2015) PEMAR: A Pervasive Middleware for Activity Recognition with Smart Phones. IEEE International Conference on Pervasive Computing and Communication Workshops, St. Louis, 23-27 March 2015, 409-414.
https://doi.org/10.1109/PERCOMW.2015.7134073
[24]  Detwiler, L.T., et al. (2008) Regular Paths in SparQL: Querying the NCI Thesaurus. AMIA Annual Symposium Proceedings, Washington DC, 8-12 November 2008, 161-165.
[25]  Chen, Z., et al. (2013) Collaborative Mobile-Cloud Computing for Civil Infrastructure Condition Inspection. Journal of Computing in Civil Engineering, 29, Article ID: 04014066.
https://doi.org/10.1061/(ASCE)CP.1943-5487.0000377
[26]  Shen, F., et al. (2015) SAMAF: Situation Aware Mobile Apps Framework. IEEE International Conference on Pervasive Computing and Communication Workshops, St. Louis, 23-27 March 2015, 26-31.
[27]  Shen, F. (2012) Situation Aware Mobile Apps Framework. Master Thesis, University of Missouri, Kansas City.
https://mospace.umsystem.edu/xmlui/handle/10355/15637
[28]  Horrocks, I., et al. (2004) SWRL: A Semantic Web Rule Language Combining OWL and RuleML. W3C Member Submission, 79.
[29]  Tao, C., et al. (2013) Phenotyping on EHR Data using OWL and Semantic Web Technologies. International Conference on Smart Health, Beijing, 3-4 August 2013, 31-32.
https://doi.org/10.1007/978-3-642-39844-5_5
[30]  Shen, F., et al. (2014) Using Semantic Web Technologies for Quality Measure Phenotyping Algorithm Representation and Automatic Execution on EHR Data. IEEE-EMBS International Conference on Biomedical and Health Informatics, Valencia, 1-4 June 2014, 531-534.
[31]  Hewett, M., et al. (2002) PharmGKB: The Pharmacogenetics Knowledge Base. Nucleic Acids Research, 30, 163-165.
https://doi.org/10.1093/nar/30.1.163
[32]  Zhu, Q., et al. (2014) Exploring the Pharmacogenomics Knowledge Base (Pharmgkb) for Repositioning Breast Cancer Drugs by Leveraging Web Ontology Language (OWL) and Cheminformatics Approaches. 19th Pacific Symposium on Biocomputing, Kohala Coast, 3-7 January 2014, 172-182.
[33]  Resource Description Framework (RDF).
https://wwww3org/RDF/
[34]  Callahan, A., et al. (2013) Bio2RDF Release 2: Improved Coverage, Interoperability and Provenance of Life Science Linked Data. Extended Semantic Web Conference, Montpellier, 26-30 May 2013, 200-212.
https://doi.org/10.1007/978-3-642-38288-8_14
[35]  Wishart, D.S., et al. (2007) DrugBank: A Knowledgebase for Drugs, Drug Actions and Drug Targets. Nucleic Acids Research, 36, D901-D906.
[36]  Povey, S., et al. (2001) The HUGO Gene Nomenclature Committee (HGNC). Human Genetics, 109, 678-680.
https://doi.org/10.1007/s00439-001-0615-0
[37]  Bult, C.J., et al. (2008) The Mouse Genome Database (MGD): Mouse Biology and Model Systems. Nucleic Acids Research, 36, D724-D728.
[38]  Shannon, P., et al. (2003) Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks. Genome Research, 13, 2498-2504.
https://doi.org/10.1101/gr.1239303
[39]  Erling, O., et al. (2009) RDF Support in the Virtuoso DBMS. In: Pellegrini, T., Auer, S., Tochtermann, K. and Schaffert, S., Eds., Networked Knowledge-Networked Media, Springer, Berlin, 7-24.
https://doi.org/10.1007/978-3-642-02184-8_2
[40]  Alkhateeb, F., et al. (2009) Extending SPARQL with Regular Expression Patterns (for Querying RDF). Journal of Web Semantics, 7, 57-73.
https://doi.org/10.1016/j.websem.2009.02.002
[41]  Kochut, K.J., et al. (2007) SPARQLeR: Extended SPARQL for Semantic Association Discovery. European Semantic Web Conference, Innsbruck, 3-7 June 2007, 145-159.
https://doi.org/10.1007/978-3-540-72667-8_12
[42]  Bezdek, J.C., et al. (1984) FCM: The Fuzzy c-Means Clustering Algorithm. Computers & Geosciences, 10, 191-203.
https://doi.org/10.1016/0098-3004(84)90020-7
[43]  Shen, F., et al. (2015) BmQGen: Biomedical Query Generator for Knowledge Discovery. IEEE International Conference on Bioinformatics and Biomedicine, Washington DC, 9-12 November 2015, 1092-1097.
[44]  Prud, E., et al. (2006) SPARQL Query Language for RDF.
[45]  Eclipse Juno Integrated Development Environment.
https://wwweclipseorg/juno/
[46]  The R Project for Statistic.
http://wwwr-projectorg/
[47]  JExcelAPI.
http://jexcelapisourceforgenet/
[48]  Kaufman, L., et al. (1990) Partitioning around Medoids (Program PAM). In: Kaufman, L. and Rousseeuw, P., Eds., Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley, New York, 68-125.
[49]  Ester, M., et al. (1996) A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise.
[50]  Hartigan, J.A., et al. (1979) Algorithm AS 136: A k-Means Clustering Algorithm. Journal of the Royal Statistical Society Series C (Applied Statistics), 28, 100-108.
https://doi.org/10.2307/2346830
[51]  Johnson, S.C. (1967) Hierarchical Clustering Schemes. Psychometrika, 32, 241-254.
https://doi.org/10.1007/BF02289588
[52]  Rousseeuw, P.J. (1987) Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis. Journal of Computational and Applied Mathematics, 20, 53-65.
https://doi.org/10.1016/0377-0427(87)90125-7
[53]  Morgan, J., et al. (1972) Calculation of the Residual Sum of Squares for All Possible Regressions. Technometrics, 14, 317-325.
https://doi.org/10.1080/00401706.1972.10488918
[54]  Query Repository.
https://githubcom/bio2rdf/bio2rdf-scripts/wiki/Query-repository
[55]  Querying Bio2RDF Data.
http://wwwslidesharenet/alisoncallahan/querying-bio2rdf-data
[56]  Hamosh, A., et al. (2005) Online Mendelian Inheritance in Man (OMIM), a Knowledgebase of Human Genes and Genetic Disorders. Nucleic Acids Research, 33, D514-D517.
[57]  Consortium, U. (2014) UniProt: A Hub for Protein Information. Nucleic Acids Research, 43, D204-D212.
https://doi.org/10.1093/nar/gku989
[58]  Zhang, Y., et al. (2013) An Integrative Computational Approach to Identify Disease-Specific Networks from PubMed Literature Information. IEEE International Conference on Bioinformatics and Biomedicine, Shanghai, 18-21 December 2013, 72-75.
https://doi.org/10.1109/BIBM.2013.6732738
[59]  Zhang, Y., et al. (2018) Systematic Identification of Latent Disease-Gene Associations from PubMed Articles. PLoS ONE, 13, e0191568.
[60]  Grochow, J.A., et al. (2007) Network Motif Discovery Using Subgraph Enumeration and Symmetry-Breaking. Annual International Conference on Research in Computational Molecular Biology, Oakland, 21-25 April 2007, 92-106.
https://doi.org/10.1007/978-3-540-71681-5_7
[61]  Blei, D.M., et al. (2003) Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993-1022.
[62]  Robinson, P.N., et al. (2010) The Human Phenotype Ontology. Clinical Genetics, 77, 525-534.
https://doi.org/10.1111/j.1399-0004.2010.01436.x
[63]  Ashburner, M., et al. (2000) Gene Ontology: Tool for the Unification of Biology. Nature Genetics, 25, 25-29.
https://doi.org/10.1038/75556
[64]  Shen, F., et al. (2017) Phenotypic Analysis of Clinical Narratives Using Human Phenotype Ontology. Studies in Health Technology and Informatics, 245, 581-585.
[65]  Shen, F., et al. (2017) Using Human Phenotype Ontology for Phenotypic Analysis of Clinical Notes. Studies in Health Technology and Informatics, 245, 1285.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413