Construct a knowledge graph is time-consuming and the knowledge graph in
the scientific domain requires extremely high labor costs due to it requires
high prior knowledge to extract knowledge from resources. To build a scientific
research knowledge graph, the most of input are papers, patent, the description
of their project and some national program (such as National High Technology
Research and Development Program of China, Major State Basic Research
Development Program of China, General Program, Key Program and Major Program)
which all of them are unstructured data, that make human participation are
mostly necessary to measure the quality.In this
paper, we design and proposed a framework using active learning; this framework
can be used to extract entity and relation from unstructured science and
technology research data.This framework combines the human and machine learning approach together,
which is active learning, to help user extract entity from those unstructured
data with less time cost. By using those data to construct a CKG as annotation
label, it further implements active learning tools and helps the expert to
rapidly annotate the data with high accuracy. Those knowledge graph constructed
by this framework can be used to finding similar research area, finding similar
researchers, finding popular research areas and so on.
References
[1]
Beck, D., Specia, L., & Cohn, T. (2013). Reducing Annotation Effort for Quality Estimation via Active Learning. In Association for Computational Linguistics Conference (pp. 543-548). Sofia: Association for Computational Linguistics.
[2]
Chen, W., & Styler, W. (2013). Anafora: A Web-Based General Purpose Annotation Tool. NAACL HLT Demonstration Session, 14-19.
[3]
Coelho da Silva, T. L., Magalhães, R. P. et al. (2019). Improving Named Entity Recognition Using Deep Learning with Human in the Loop. In Proceedings of the 22nd International Conference on Extending Database Technology (pp. 594-597). Lisbon: OpenProceedings.org.
[4]
Eckart de Castilho, R., Mújdricza-Maydt, é., Yimam, S. M., Hartmann, S., Gurevych, I., Frank, A., & Biemann, C. (2016). A Web-Based Tool for the Integrated Annotation of Semantic and Syntactic Structures. In LT4DH Workshop (pp. 76-84). Osaka: The COLING 2016 Organizing Committee.
[5]
Giorgi, J. M., Bader, G. D., & Wren, J. (2020). Towards Reliable Named Entity Recognition in the Bio-Medical Domain. Bioinformatics, 36, 280-286. https://doi.org/10.1093/bioinformatics/btz504
[6]
Gong, J. B., Wang, S., Wang, J. L., Feng, W. Z., Peng, H., Tang, J., & Yu, P. S. (2020). Attentional Graph Convolutional Networks for Knowledge. Concept Recommendation in MOOCs in a Heterogeneous View. In SIGIR (pp. 79-88). Virtual Event: ACM. https://doi.org/10.1145/3397271.3401057
[7]
Holzinger, A. (2016). Interactive Machine Learning for Health Informatics: When Do We Need the Human-in-the-Loop? Brain Informatics, 3, 119-131. https://doi.org/10.1007/s40708-016-0042-6
[8]
Klie, J.-C., Bugert, M., Boullosa, B., Eckart de Castilho, R., & Gurevych, I. (2018). The Inception Platform: Machine-Assisted and Knowledge-Oriented Interactive Annotation. In Proceedings of System Demonstrations of the 27th International Conference on Computational Linguistics (pp. 5-9). Santa Fe, NM: Association for Computational Linguistics.
[9]
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., & Dyer, C. (2016). Neural Architectures for Named Entity Recognition. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 260-270). Stroudsburg, PA: Association for Computational Linguistics. https://doi.org/10.18653/v1/N16-1030
[10]
Pujara, J., Miao, H., Getoor, L., & Cohen, W. (2013). Knowledge Graph Identification. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp. 542-557). Berlin: Springer.
[11]
Shen, Y., Yun, H., Lipton, Z.C., Kronrod, Y., & Anandkumar, A. (2017). Deep Active Learning for Named Entity Recognition. In Proceedings of the 2nd Workshop on Representation Learning for NLP (pp. 252-256). Stroudsburg, PA: Association for Computational Linguistics. https://doi.org/10.18653/v1/W17-2630
[12]
Sheng, M., Shao, Y., Zhang, Y., Li, C., Xing, C., Zhang, H., Wang, J., & Gao, F. (2019). DEKGB: An Extensible Framework for Health Knowledge Graph. In ICSH (pp. 27-38). Shenzhen: Springer. https://doi.org/10.1007/978-3-030-34482-5_3
[13]
Sheng, M., Wang, J., Zhang, Y., Li, X., Li, C., Xing, C., Li, Q., Shao, Y., & Zhang, H. (2019). DocKG: A Knowledge Graph Framework for Health with Doctor-in-the-Loop. In HIS (pp. 3-14). Xi’an: Springer. https://doi.org/10.1007/978-3-030-32962-4_1
[14]
Verborgh, R., Vander Sande, M., Hartig, O. et al. (2016). Triple Pattern Fragments: A Low-Cost Knowledge Graph Interface for the Web. Journal of Web Semantics, 37, 184-206. https://doi.org/10.1016/j.websem.2016.03.003
[15]
Vieira, S. M., Kaymak, U., & Sousa, J. M. C. (2010). Cohen’s Kappa Coefficient as a Performance Measure for Feature Selection. In WCCI 2010 (pp. 1-8). Barcelona: IEEE. https://doi.org/10.1109/FUZZY.2010.5584447
[16]
Wan, H. Y., Zhang, Y. T., Zhang, J., & Tang, J. (2019). AMiner: Search and Mining of Academic Social Networks. Data Intelligence, 1, 58-76. https://doi.org/10.1162/dint_a_00006
[17]
Yang, J., Zhang, Y., Li, L. W., & Li, X. X. (2018). YEDDA: A Lightweight Collaborative Text Span Annotation Tool. In ACL 2018 (pp. 31-36). Melbourne: Association for Computational Linguistics.
[18]
Yang, Y., Kandogan, E., Li, Y., Sen, P., & Lasecki, W. S. (2019). A Study on Interaction in Human-in-the-Loop Machine Learning for Text Analytics. In CEUR Workshop (Vol. 2327). Los Angeles: CEUR-WS.org.
[19]
Yuan, S., Shao, Z., Liang, Y. X., Tang, J., Hall, W., Liu, G., & Zhang, Y. T. (2020). International Scientific Collaboration in Artificial Intelligence an Analysis Based on Web Data. In 12th ACM Conference on Web Science (pp. 69-75). Southampton: ACM. https://doi.org/10.1145/3394231.3397896