计算机科学与探索 ›› 2017, Vol. 11 ›› Issue (4): 599-607.DOI: 10.3778/j.issn.1673-9418.1603057

• 人工智能与模式识别 • 上一篇    下一篇

越南语短语树到依存树的转换研究

李  英1,郭剑毅1,2+,余正涛1,2,毛存礼1,2,线岩团1,2   

  1. 1. 昆明理工大学 信息工程与自动化学院,昆明 650500
    2. 昆明理工大学 智能信息处理重点实验室,昆明 650500
  • 出版日期:2017-04-12 发布日期:2017-04-12

Constituent-to-Dependency Conversion for Vietnamese

LI Ying1, GUO Jianyi1,2+, YU Zhengtao1,2, MAO Cunli1,2, XIAN Yantuan1,2   

  1. 1. School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
    2. Key Laboratory of Intelligent Information Processing, Kunming University of Science and Technology, Kunming 650500, China
  • Online:2017-04-12 Published:2017-04-12

摘要: 依存句法分析是自然语言处理的一个关键环节,目前对于越南语短语结构树的研究比较多,而依存结构树的研究就显得十分薄弱。提出了一种新的方法,尝试结合越南语的语言特点和语法特征,利用中心子节点过滤表的思想与统计的方法将越南语的短语结构树转换成依存结构树。首先依据中文依存关系标注体系与越南语的语法规则,制定出依存关系列表;然后结合越南语的语言特点,制定出中心子节点过滤表,利用中心子节点过滤表的思想进行初步转化;最后使用依存关系标注器来进行依存关系标注。基于转换后得到的依存结构树,利用MSTParser工具进一步训练得到更多的越南语依存结构树。对实验结果进行了抽样评估,树库转换的准确率达到了89.4%,较好地解决了越南语由短语树到依存树的转换问题。

关键词: 句法分析, 中心子节点过滤表, 短语结构, 依存结构, 树库

Abstract: Dependency parsing is a key part of the natural language processing. Currently, there are some researches on Vietnamese phrase structure trees, but few on dependency structure treebank. This paper proposes a novel method, which combines the Vietnamese language features and grammatical features, uses the head percolation table as well as statistical machining learning method to convert the Vietnamese phrase structure treebank into a dependency one. Firstly, according to Chinese dependency annotation system and Vietnamese grammar rules, a list of dependencies are developed; Secondly, integrating the characteristics of Vietnamese language, the head percolation table is worked out; Thirdly, using the head percolation table to carry out preliminary conversion; Finally, using dependency tagger to tag dependency. Vietnamese dependency structure treebank increases by training converted treebank with MSTParser tool. The precision of conversion reaches 89.4%. The experimental results show that the proposed method gives a better solution of converting constituent-to-dependency treebank for Vietnamese.

Key words: syntactic analysis, head percolation table, phrase structure, dependency structure, treebank