With the widespread use of Chinese globally, the number of Chinese learners has been increasing, leading to various grammatical errors among beginners. Additionally, as domestic efforts to develop industrial information grow, electronic documents have also proliferated. When dealing with numerous electronic documents and texts written by Chinese beginners, manually written texts often contain hidden grammatical errors, posing a significant challenge to traditional manual proofreading. Correcting these grammatical errors is crucial to ensure fluency and readability. However, certain special types of text grammar or logical errors can have a huge impact, and manually proofreading a large number of texts individually is clearly impractical. Consequently, research on text error correction techniques has garnered significant attention in recent years. The advent and advancement of deep learning have paved the way for sequence-to-sequence learning methods to be extensively applied to the task of text error correction. This paper presents a comprehensive analysis of Chinese text grammar error correction technology, elaborates on its current research status, discusses existing problems, proposes preliminary solutions, and conducts experiments using judicial documents as an example. The aim is to provide a feasible research approach for Chinese text error correction technology.
References
[1]
Chollampatt, S. and Ng, H.T. (2018) A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction. Proceedings of the AAAI Conference on Artificial Intelligence, 32, 5755-5767. https://doi.org/10.1609/aaai.v32i1.12069
[2]
Zheng, B., Che, W., Guo, J., et al. (2016) Chinese Grammatical Error Diagnosis with Long Short-Term Memory Networks. Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2016), Osaka, 12 December 2016, 49-56.
[3]
Rozovskaya, A., Chang, K., Sammons, M., Roth, D. and Habash, N. (2014) The Illinois-Columbia System in the CoNLL-2014 Shared Task. Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, Baltimore, 26-27 July 2014, 34-42. https://doi.org/10.3115/v1/w14-1704
[4]
Dahlmeier, D., Ng, H.T. and Ng, E.J.F. (2012) NUS at the HOO 2012 Shared Task. Proceedings of the 7th Workshop on Building Educational Applications Using NLP, Montréal, 3-8 June 2012, 216-224.
[5]
Junczys-Dowmunt, M. and Grundkiewicz, R. (2016) Phrase-Based Machine Translation Is State-of-the-Art for Automatic Grammatical Error Correction. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, 1-5 November 2016, 1546-1556. https://doi.org/10.18653/v1/d16-1161
[6]
Chollampatt, S. and Ng, H.T. (2017) Connecting the Dots: Towards Human-Level Grammatical Error Correction. Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, Copenhagen, 8 September 2017, 327-333. https://doi.org/10.18653/v1/w17-5037
[7]
Rozovskaya, A. and Roth, D. (2016) Grammatical Error Correction: Machine Translation and Classifiers. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, 7-12 August 2016, 2205-2215. https://doi.org/10.18653/v1/p16-1208
[8]
Yuan, Z. (2017) Grammatical Error Correction in Non-Native English. University of Cambridge.
[9]
Kukich, K. (1992) Techniques for Automatically Correcting Words in Text. ACM Computing Surveys, 24, 377-439. https://doi.org/10.1145/146370.146380
[10]
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., et al. (2014) Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, 25-29 October 2014, 1724-1734. https://doi.org/10.3115/v1/d14-1179
[11]
Sutskever, I., Vinyals, O. and Le, Q.V. (2014) Sequence to Sequence Learning with Neural Networks. Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, 8-13 December 2014, 3104-3112.
[12]
Bahdanau, D., Cho, K. and Bengio, Y. (2014) Neural Machine Translation by Jointly Learning to Align and Translate. arXiv: 1409.0473. https://doi.org/10.48550/arXiv.1409.0473
[13]
Luong, T., Pham, H. and Manning, C.D. (2015) Effective Approaches to Attention-Based Neural Machine Translation. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, 17-21 September 2015, 1412-1421. https://doi.org/10.18653/v1/d15-1166
[14]
Gehring, J., Auli, M., Grangier, D., et al. (2017) Convolutional Sequence to Sequence Learning. Proceedings of the 34th International Conference on Machine Learning, Sydney, 6-11 August 2017, 1243-1252.
[15]
Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention Is All You Need. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 6000-6010.
[16]
Macdonald, N., Frase, L., Gingrich, P. and Keenan, S. (1982) The Writer’s Workbench: Computer Aids for Text Analysis. IEEE Transactions on Communications, 30, 105-110. https://doi.org/10.1109/tcom.1982.1095380
[17]
Bustamante, F.R. and León, F.S. (1996) GramCheck: A Grammar and Style Checker. Proceedings of the 16th Conference on Computational Linguistics, Copenhagen, 5-9 August 1996, 175-181. https://doi.org/10.3115/992628.992661
[18]
Heidorn, G.E., Jensen, K., Miller, L.A., Byrd, R.J. and Chodorow, M.S. (1982) The EPISTLE Text-Critiquing System. IBM Systems Journal, 21, 305-326. https://doi.org/10.1147/sj.213.0305
[19]
Richardson, S.D. and Braden-Harder, L.C. (1988) The Experience of Developing a Large-Scale Natural Language Text Processing System: CRITIQUE. Proceedings of the Second Conference on Applied Natural Language Processing, Austin, 9-12 February 1988, 195-202. https://doi.org/10.3115/974235.974271
[20]
Sakaguchi, K. (2018) Robust Text Correction for Grammar and Fluency. Johns Hopkins University.
[21]
Knight, K. and Chander, I. (1994) Automated Postediting of Documents. Proceedings of the Twelfth AAAI National Conference on Artificial Intelligence, Seattle, 1-4 August 1994, 779-784.
[22]
Han, N.R., Chodorow, M. and Leacock, C. (2004) Detecting Errors in English Article Usage with a Maximum Entropy Classifier Trained on a Large, Diverse Corpus. Proceedings of the Fourth International Conference on Language Resources and Evaluation, Lisbon, 26-28 May 2004, 1625-1628.
[23]
Chodorow, M., Tetreault, J.R. and Han, N. (2007) Detection of Grammatical Errors Involving Prepositions. Proceedings of the Fourth ACL-SIGSEM Workshop on Prepositions, Prague, 28 June 2007, 25-30. https://doi.org/10.3115/1654629.1654635
[24]
De Felice, R. and Pulman, S.G. (2007) Automatically Acquiring Models of Preposition Use. Proceedings of the Fourth ACL-SIGSEM Workshop on Prepositions, Prague, 28 June 2007, 45-50. https://doi.org/10.3115/1654629.1654639
[25]
De Felice, R. and Pulman, S.G. (2008) A Classifier-Based Approach to Preposition and Determiner Error Correction in L2 English. Proceedings of the 22nd International Conference on Computational Linguistics, Manchester, 18-22 August 2008, 169-176. https://doi.org/10.3115/1599081.1599103
[26]
Tetreault, J.R. and Chodorow, M. (2008) The Ups and Downs of Preposition Error Detection in ESL Writing. Proceedings of the 22nd International Conference on Computational Linguistics, Manchester, 18-22 August 2008, 865-872. https://doi.org/10.3115/1599081.1599190
[27]
Han, N.-R., Tetreault, J., Lee, S.-H., et al. (2010) Using an Error-Annotated Learner Corpus to Develop an ESL/EFL Error Correction System. Proceedings of the Seventh International Conference on Language Resources and Evaluation, Valletta, 17-23 May 2010, 763-770.
[28]
Leacock, C., Chodorow, M., Gamon, M., et al. (2010) Automated Grammatical Error Detection for Language Learners. Morgan & Claypool.
[29]
Tetreault, J., Foster, J. and Chodorow, M. (2010) Using Parse Features for Preposition Selection and Error Detection. Proceedings of the ACL 2010 Conference Short Papers, Uppsala, 11-16 July 2010, 353-358.
[30]
Brockett, C., Dolan, W.B. and Gamon, M. (2006) Correcting ESL Errors Using Phrasal SMT Techniques. Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL, Sydney, 17-18 July 2006, 249-256. https://doi.org/10.3115/1220175.1220207
[31]
Dahlmeier, D. and Ng, H.T. (2012) A Beam-Search Decoder for Grammatical Error or Rection. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, 12-14 July 2012, 568-578.
[32]
Yuan, Z. and Felice, M. (2013) Constrained Grammatical Error Correction Using Statistical Machine Translation. Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task, Sofia, 8-9 August 2013, 52-61.
[33]
Chollampatt, S., Hoang, D.T. and Ng, H.T. (2016) Adapting Grammatical Error Correction Based on the Native Language of Writers with Neural Network Joint Models. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, 1-5 November 2016, 1901-1911. https://doi.org/10.18653/v1/d16-1195
[34]
Chollampatt, S., Taghipour, K. and Ng, H.T. (2016) Neural Network Translation Models for Grammatical Error Correction. Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, New York, 9-15 July 2016, 2768-2774.
[35]
Yuan, Z. and Briscoe, T. (2016) Grammatical Error Correction Using Neural Machine Translation. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, 12-17 June 2016, 380-386. https://doi.org/10.18653/v1/n16-1042
[36]
Xie, Z., Avati, A., Arivazhagan, N., et al. (2016) Neural Language Correction with Character-Based Attention. arXiv: 1603.09727. https://doi.org/10.48550/arXiv.1603.09727
[37]
Ji, J., Wang, Q., Toutanova, K., Gong, Y., Truong, S. and Gao, J. (2017) A Nested Attention Neural Hybrid Model for Grammatical Error Correction. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, 30 July-4 August 2017, 753-762. https://doi.org/10.18653/v1/p17-1070
[38]
Napoles, C. and Callison-Burch, C. (2017) Systematically Adapting Machine Translation for Grammatical Error Correction. Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, Copenhagen, 8 September 2017, 345-356. https://doi.org/10.18653/v1/w17-5039
[39]
Sakaguchi, K., Post, M. and Van Durme, B. (2017) Grammatical Error Correction with Neural Rein-Forcement Learning. Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Taipei, 27 November-1 December 366-372.
[40]
Junczys-Dowmunt, M., Grundkiewicz, R., Guha, S. and Heafield, K. (2018) Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, 1-6 June 2018, 595-606. https://doi.org/10.18653/v1/n18-1055
[41]
Grundkiewicz, R. and Junczys-Dowmunt, M. (2018) Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), New Orleans, 1-6 June 2018, 284-290. https://doi.org/10.18653/v1/n18-2046
[42]
Zhao, W., Wang, L., Shen, K., Jia, R. and Liu, J. (2019) Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data. Proceedings of the 2019 Conference of the North, Minneapolis, 2-7 June 2019, 156-165. https://doi.org/10.18653/v1/n19-1014
[43]
Mizumoto, T., Komachi, M., Nagata, M., et al. (2011) Mining Revision Log of Language Learning SNS for Automated Japanese Error Correction of Second Language Learners. Proceedings of 5th International Joint Conference on Natural Language Processing, Chiang Mai, 8-13 November 2011, 147-155.
[44]
Mizumoto, T., Hayashibe, Y., Komachi, M., et al. (2012) The Effect of Learner Corpus Size in Grammatical Error Correction of ESL Writings. Proceedings of COLING 2012: Posters, Mumbai, 8-15 December 2012, 863-872.
[45]
Junczys-Dowmunt, M. and Grundkiewicz, R. (2014) The AMU System in the Conll-2014 Shared Task: Grammatical Error Correction by Data-Intensive and Feature-Rich Statistical Machine Translation. Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, Baltimore, 26-27 July 2014, 25-33. https://doi.org/10.3115/v1/w14-1703
[46]
Mizumoto, T. and Matsumoto, Y. (2016) Discriminative Reranking for Grammatical Error Correction with Statistical Machine Translation. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, 12-17 June 2016, 1133-1138. https://doi.org/10.18653/v1/n16-1133
[47]
Yuan, Z., Briscoe, T. and Felice, M. (2016) Candidate Re-Ranking for SMT-Based Grammatical Error Correction. Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications, San Diego, 12-17 June 2016, 256-266. https://doi.org/10.18653/v1/w16-0530
[48]
Hoang, D.T., Chollampatt, S. and Ng, H.T. (2016) Exploiting N-Best Hypotheses to Improve an SMT Approach to Grammatical Error Correction. Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, New York, 9-15 July 2016, 2803-2809.
[49]
Wu, Y., Schuster, M., Chen, Z., et al. (2016) Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv: 1609.08144. https://doi.org/10.48550/arXiv.1609.08144