计算机科学与探索 ›› 2015, Vol. 9 ›› Issue (12): 1420-1429.DOI: 10.3778/j.issn.1673-9418.1509005

• 综述·探索 • 上一篇    下一篇

维吾尔语短语自动抽取研究进展

张海军1,2+   

  1. 1. 新疆师范大学 计算机科学技术学院,乌鲁木齐 830054
    2. 新疆师范大学 初等教育学院,乌鲁木齐 830054
  • 出版日期:2015-12-01 发布日期:2015-12-04

Progress of Automatic Extraction of Uyghur Phrases

ZHANG Haijun1,2+   

  • Online:2015-12-01 Published:2015-12-04

摘要: 短语识别是机器翻译与信息检索的技术基础,具有重要的研究价值。围绕维吾尔语短语识别的研究进展,阐述了维吾尔语的语言特点,分析了这些特点对维吾尔语短语识别的影响,总结了近年来维吾尔语短语识别的有关语言学研究成果,重点梳理了维吾尔语短语自动抽取的相关研究方法。通过总结和梳理,发现目前维吾尔语短语自动抽取研究在理论和实现技术方面取得了较大进展,但在短语标注标准、研究语料及研究领域等方面还有大量工作尚未有效开展,需要予以关注。希望该文能为维吾尔语短语抽取相关研究提供借鉴和参考。

关键词: 维吾尔语, 短语, 规则, 统计, 术语, 命名实体

Abstract: Phrase extraction, which is the research basis of machine translation and information retrieval, plays a very important role in natural language processing. This paper puts the emphasis on the research progress of of Uyghur phrase extraction. To make convenience for discussion, this paper studies the linguistic features of Uyghur phrases and analyzes the impacts of these features on the phrase extraction. This paper mainly summarizes the philological theories of phrase identification in Uyghur and discusses the technologies of automatic extraction of Uyghur phrases. There has made great progress on the extraction of Uyghur phrases in both theory and technology. However, there are still lots of work to be carried out, such as to formulate tagging standard, study tagged corpus and expend research domains etc. It is hoped that this paper can give some references to the research on phrase extraction in Uyghur.

Key words: Uyghur, phrase, rules, statistics, term, named entity