|
基于Bert-BiLSTM-CRF的标讯信息提取实现
|
Abstract:
面对海量的标讯信息规模及复杂的数据结构,如何高效地挖掘潜在的数据价值,是能否有效实现招投标领域大数据应用的关键。本文通过大量数据标注,借助Bert-BiLSTM-CRF机器学习算法,对标讯信息的关键字段实现自动提取,有效实现标讯信息的结构化和价值化。
In the face of massive scale and complex data structure of bidding information, how to efficiently tap the potential data value is the key to effectively implement big data applications in the bidding field. In this paper, with the help of Bert BiLSTM-CRF machine learning algorithm, the key fields of the banner information are automatically extracted through a large number of data annotations, effectively realizing the structure and value of the banner information.
[1] | Zhang, Y. and Yang, J. (2018) Chinese NER Using Lattice LSTM. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, July 2018, 1554-1564. https://doi.org/10.18653/v1/P18-1144 |
[2] | Chen, D., Li, Z., Li, Z., et al. (2019) Semi-Supervised Entity Recognition of Chinese Government Document. Proceedings of the 2nd International Conference on Artificial Intelligence and Pattern Recognition, Beijing, 16-18 August 2019, 145-149. https://doi.org/10.1145/3357254.3357288 |
[3] | 王子牛, 姜猛, 高建瓴, 陈娅先. 基于BERT的中文命名实体识别方法[J]. 计算机科学, 2019, 46(z2): 138-142. |
[4] | 赵畅, 李慧颖. 面向知识库问答的实体链接方法[J]. 中文信息学报, 2019, 33(11): 125-133. |
[5] | Cai, X., Dong, S. and Hu, J. (2019) A Deep Learning Model Incorporating Part of Speech and Self-Matching Attention for Named Entity Recognition of Chinese Electronic Medical Records. BMC Medical Informatics and Decision Making, 19, Article No. 65. https://doi.org/10.1186/s12911-019-0762-7 |
[6] | Cheng, J., Pan, C., Dang, J., et al. (2020) Entity Linking for Chinese Short Texts Based on BERT and Entity Name Embeddings. |
[7] | Kareem, D., Ahmed, A., Hamdy, M. and Mohamed, E. (2020) Arabic Diacritic Recovery Using a Feature-Rich biLSTM Model. ACM Transactions on Asian and Low-Resource Language Information Processing, 20, 1-18.
https://doi.org/10.1145/3434235 |