|
计算机科学 2006
Multiple Language Identification Based on Character-level Markov Models
|
Abstract:
Language identification is a necessary pre-process in machine translation and other muhi-language applications, but no experiments hase yet been reported on double-byte encoded languages, such as Chinese and Japanese. An efficient EM based training algorithm on Markov language model is proposed and evaluated. The performance analysis and comparison with other algorithms are also presented.