|
计算机应用 2008
Study on Chinese-English corpus construction toward multiple-domain resources
|
Abstract:
With the consideration of the features of open, multiple-domain and layout regularity of bilingual resources on Web, a mixture probabilistic alignment model was proposed to reveal the domain-specific and position-specific characteristic for aligning texts. Compared to the traditional lengthen-based aligning model, the model in this paper achieves 37% and 40.4% improvement on precise and recall respectively with the extensive experiments.