Journal of Frontiers of Computer Science and Technology ›› 2016, Vol. 10 ›› Issue (1): 122-129.DOI: 10.3778/j.issn.1673-9418.1502017

Previous Articles     Next Articles

Semantic Model with Thesaurus for Forestry Information Retrieval

HAN Qichen1,2, LI Dongmei1+   

  1. 1. School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China
    2. School of Engineering Science, University of Chinese Academy of Sciences, Beijing 100049, China
  • Online:2016-01-01 Published:2016-01-07

基于叙词表的林业信息语义检索模型

韩其琛1,2,李冬梅1+   

  1. 1. 北京林业大学 信息学院,北京 100083
    2. 中国科学院大学 工程科学学院,北京 100049

Abstract: With the speedy development of the Internet, keyword-based retrieval method has failed to meet the needs of people. The semantic relationship within the thesaurus can improve recall ratio and precision ratio. If the thesaurus is introduced into current network information retrieval tool, the search technology would be definitely improved with the aid of rich semantic relationship of the thesaurus. This paper proposes an idea of calculating the similarity based on the relationship among the terms in the thesaurus. Utilizing query extension, this paper designs a semantic model with thesaurus for forestry information retrieval (SMTFIR). Finally, this paper compares SMTFIR, Baidu and the method used in agricultural thesaurus with two category realms in forestry thesaurus. The?results show that SMTFIR can improve keyword-based retrieval method more effectively using thesaurus. In addition, SMTFIR is also suitable to other domains and provides a new thought for applying thesaurus in network information system.

Key words: forestry thesaurus, semantic retrieval, similarity computation, query extension, webpage grabbing

摘要: 随着互联网的快速发展,基于关键词字面匹配的信息检索方式已不能满足人们的需求。叙词表中所包含的语义关系是提高查全率和查准率的重要途径,如果将叙词表控制机制引入当前网络信息检索工具中,必然能在一定程度上提高信息检索的效率。利用叙词表中的词间关系,提出了一种计算叙词间语义相似度的方法,借助查询扩展的思想,设计了一种基于叙词表的林业信息语义检索模型。最后,以林业汉英拉叙词表中两个类目范畴作为实验对象,分别同百度搜索引擎、农业叙词表中所使用的检索方法进行了比较,实验结果表明,提出的检索模型可以更好地利用叙词表来改进传统的基于关键字的检索方式,此外,所提模型是通用的,为叙词表在网络信息系统中的应用提供了一种新的思路。

关键词: 林业叙词表, 语义检索, 相似度计算, 查询扩展, 网页抓取