|
自动化学报 2009
Synchronous Tree Sequence Substitution Grammar for Statistical Machine Translation
|
Abstract:
Phrase-based models are the state-of-the-art statistical machine translation models. However, they can not effectively handle global reordering and discontiguous phrases due to the lack of structural information. While syntax-based models have the potential to attack these problems, they suffer from the strictly syntactic constraints. To address these constraints and integrate the advantages of phrase-based models into syntax-based models, a synchronous tree sequence substitution grammar (STSSG) based statistical machine translation (SMT) model is presented in this paper. This novel model uses the tree sequence as the basic translation unit. Therefore, both the syntactic translation equivalences and the non-syntactic translation equivalences equipped with syntactic information can be utilized in the translation. Experimental results on the NIST 2005 Chinese-English machine translation data-set show that the proposed method achieves significant improvements over baseline methods including a phrasal model, Moses, and a tree-based syntax model.