全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

交易信息的数据处理及分词应用研究
Research on Data Processing and Word Segmentation of Bids Information

DOI: 10.12677/SEA.2022.116145, PP. 1415-1422

Keywords: 数据价值,数据清洗,LTP,Jieba
Data Value
, Data Value, Data Cleaning, LTP, Jieba Annotation

Full-Text   Cite this paper   Add to My Lib

Abstract:

数据清洗技术及分词技术的应用对于挖掘海量交易信息潜在的数据价值至关重要,对不同类型交易“脏数据”按不同策略进行数据预处理,同时通过对LTP及Jieba分词技术在交易领域的应用研究,在提高交易关键信息的识别与处理效率及最终数据质量的同时,探索对信息搜索准确率及查全率的提升作用。
The application of data cleaning technology and word segmentation technology is crucial to mining the potential data value of massive bids information. Data preprocessing is carried out for “dirty data” of different types of bids according to different strategies. At the same time, through the research on the application of LTP and Jieba word segmentation technology in the bids field, while improving the identification and processing efficiency of key bids information and the final data quality, the role of improving the accuracy and recall of information search is explored.

References

[1]  Matthias, S., Garham, N., Jan, N. and Alex, W. (2017) Neural Lattice-to-Sequence Models for Uncertain Inputs. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, September 2017, 1380-1389.
[2]  Zhou, G.D. and Su, J. (2002) Named Entity Recognition Using an HMM-Based Chunk Tagger. Proceedings of the 40th Annual Meeting on Association for Computational Linguistic, Philadelphia, 7-12 July, 2002, 473-480.
https://doi.org/10.3115/1073083.1073163
[3]  顾佼佼, 杨志宏, 姜文志, 等. 基于条件随机场的中文分词算法改进[J]. 信息与电子工程, 2012, 10(2): 184-187.
[4]  殷章志, 李欣子, 黄德根, 李玖一. 融合字词模型的中文命名实体识别研究[J]. 中文信息学报, 2019, 33(11): 95-100, 106.
[5]  Sassi, I., Anter, S. and Bekkhoucha, A. (2021) ParaDist-HMM: A Parallel Distributed Implementation of Hidden Markov Model for Big Data Analytics using Spark. International Journal of Advanced Computer Science and Applications, 12, 289-303.
https://doi.org/10.14569/IJACSA.2021.0120438
[6]  刘伟, 黄锴宇, 余浩, 等. 基于语境相似度的中文分词一致性检验研究[J]. 北京大学学报(自然科学版), 2022, 58(1): 99-105.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133