Journal of Frontiers of Computer Science and Technology ›› 2021, Vol. 15 ›› Issue (7): 1183-1194.DOI: 10.3778/j.issn.1673-9418.2012006

• Surveys and Frontiers • Previous Articles     Next Articles

Advance Research on Neural Machine Translation Integrating Linguistic Knowledge

GUO Wanghao, FAN Jiangwei, ZHANG Keliang   

  1. 1. Luoyang Campus, Information Engineering University, Luoyang, Henan 471003, China
    2. College of Information Engineering, Zhengzhou University, Zhengzhou 450001, China
  • Online:2021-07-01 Published:2021-07-09

融合语言学知识的神经机器翻译研究进展

郭望皓范江威张克亮   

  1. 1. 战略支援部队信息工程大学 洛阳校区,河南 洛阳 471003
    2. 郑州大学 信息工程学院,郑州 450001

Abstract:

Although neural machine translation has become the mainstream method and paradigm in the current research and application of machine translation, there are also some problems such as the fluent but not faithful of the translation results, difficult processing of rare words, poor performance of low-resource languages, poor cross-domain adaptability, and low prior knowledge utilization. Inspired by statistical machine translation research, incorporating linguistic information into neural machine translation models, using existing linguistic knowledge, alleviating the inherent difficulties faced by neural machine translation and improving translation quality has become a hot topic in the field of neural machine translation research. According to the grammatical unit??s classification system, the research in this area can be divided into three categories: neural machine translation incorporating character or word structure information, neural machine translation incorporating phrase structure information, and neural machine translation incorporating syntactic structure information. The current research  also focuses on these three aspects. On the basis of sorting out the main challenges and reasons faced by neural machine translation, this paper focuses on each type of research to introduce its core ideas and functions, status and main results, problems and development trends. Finally, it summarizes the challenges that still exist in this field and looks forward to future research direction.

Key words: neural machine translation, linguistic knowledge, character or word structure information, phrase structure information, syntactic structure information

摘要:

尽管神经机器翻译已经成为目前机器翻译研究应用中的主流方法与范式,然而同时也存在译文流利但不够忠实、罕见词处理困难、低资源语言表现不佳、跨领域适应性差、先验知识利用率低等问题。受统计机器翻译研究启发,在神经机器翻译模型中融入语言学信息,利用已有的语言学知识,缓解神经机器翻译面临的固有困境,提升翻译质量,成为神经机器翻译研究领域的一个热门话题。根据语法单位分类体系,可以将这方面的研究分为三类:融合字词结构信息的神经机器翻译、融合短语结构的神经机器翻译和融合句法结构信息的神经机器翻译。目前的研究主要集中在这三方面。在梳理神经机器翻译面临的主要挑战及原因的基础上,重点介绍了每一类研究的核心思想与作用、现状与主要成果、面临的问题及发展趋势。最后总结归纳现有研究中面临的主要挑战,并对未来的研究方向进行展望。

关键词: 神经机器翻译, 语言学知识, 字词结构信息, 短语结构信息, 句法结构信息