计算机科学与探索 ›› 2018, Vol. 12 ›› Issue (6): 972-980.DOI: 10.3778/j.issn.1673-9418.1708028

• 人工智能与模式识别 • 上一篇    下一篇

专利查询扩展的词向量方法研究

许  侃1,林  原2,曲  忱1,徐  博1,林鸿飞1+   

  1. 1. 大连理工大学 计算机科学与技术学院,大连 116024
    2. 大连理工大学 科学学与科技管理研究所,大连 116024
  • 出版日期:2018-06-01 发布日期:2018-06-06

Research on Patent Query Expansion Methods Using Word Embedding

XU Kan1, LIN Yuan2, QU Chen1, XU Bo1, LIN Hongfei1+   

  1. 1. School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
    2. Institute of Science of Science and Science & Technology Management, Dalian University of Technology, Dalian, Liaoning 116024, China
  • Online:2018-06-01 Published:2018-06-06

摘要: 查询扩展技术被广泛地应用于信息检索系统中。为提高专利检索的结果,采用查询扩展方法进行优化,利用相关专利文本训练词向量,并选择与原始查询相似度高的候选词作为查询扩展词,加入原始查询中。提出4种方法运用词向量获取查询扩展词,并提出两种方法进行扩展词相关性排序,改进已有的查询扩展词选择方法。在TREC数据集上的实验显示,将词向量模型进行扩展词选择的方法与传统的TF-IDF扩展词选择方法相融合,可以有效提高查询扩展模型的性能,对于理解用户的查询意图有着很好的促进作用。

关键词: 信息检索, 查询扩展, 排序学习, 专利检索

Abstract:  Query expansion is wildly used in information retrieval systems. In order to improve patent retrieval results, this paper applies query expansion methods for optimization. After training the word embedding models using relevant documents, words for query expansion are selected based on the similarities with original query. This paper proposes four methods to select query expansion terms by applying word embedding, and proposes two methods to rank the terms by relevance to the query. These methods are used to improve the existing query expansion methods. The expe-riments conducted on TREC dataset indicate that combining traditional TF-IDF expansion method with the proposed    approach can improve the performance of query expansion models, leading to a better understanding of query intent. 

Key words: information retrieval, query expansion, learning to rank, patent retrieval