Journal of Frontiers of Computer Science and Technology ›› 2017, Vol. 11 ›› Issue (5): 785-793.DOI: 10.3778/j.issn.1673-9418.1603056

Previous Articles     Next Articles

Research of Deep Filtering Lexical Reordering Table

KONG Jinying1,2,3, LI Xiao1,2, WANG Lei1,2, YANG Yating1,2+, LUO Yangen1,3   

  • Online:2017-05-01 Published:2017-05-04

调序规则表的深度过滤研究

孔金英1,2,3,李  晓1,2,王  磊1,2,杨雅婷1,2+,罗延根1,3   

  1. 1. 中国科学院 新疆理化技术研究所,乌鲁木齐 830011
    2. 新疆民族语音语言信息处理重点实验室,乌鲁木齐 830011
    3. 中国科学院大学,北京 100049

Abstract: In statistical machine translation system, lexical reordering table and phrase-table are always huge. Tuning and filtering the phrase-table has been research focus long time, while few researchers focus on filtering the lexical reordering table. This paper treats filtering lexical reordering table as the problem of short text classification, proposes a filtering model of lexical reordering table based on Autoencoder. This model uses the Autoencoder to score the    reordering rules firstly, then filters the lexical reordering table by minimal difference strategy, finally recalculates lexical reordering score table used for machine translation decoding. The experimental results show that the size of lexical reordering table reduces 40% while the BLEU (bilingual evaluation understudy) increases 0.19 and 0.26 by using the proposed model on public English-Chinese corpus and Uyghur-Chinese corpus.

Key words: Autoencoder, filtering model, lexical reordering table, machine translation

摘要: 机器翻译系统中调序规则表和翻译表一般规模都很大,对翻译表进行优化过滤一直都是研究热点,而过滤调序规则表的研究却近乎空白。将调序规则表的过滤当成短文本分类问题,提出了一种基于自动编码机(Autoencoder)的调序规则表过滤模型。该模型首先使用一种基于自动编码机的分类器对调序规则进行打分评价,然后对调序规则表进行基于最小差异策略的过滤,最后使用过滤得到的调序规则表重新计算调序规则得分表用于机器翻译的解码过程。实验表明,在公开的英汉语料和维汉语料上使用该模型,可以在调序规则表减少40%的基础上分别将BLEU(bilingual evaluation understudy)值提高0.19和0.26。

关键词: 自动编码机, 过滤模型, 调序规则表, 机器翻译