利用主题内容排序的伪相关反馈

doi:10.3778/j.issn.1673-9418.1603068

计算机科学与探索 ›› 2017, Vol. 11 ›› Issue (5): 814-821.DOI: 10.3778/j.issn.1673-9418.1603068

利用主题内容排序的伪相关反馈

闫蓉+，高光来

内蒙古大学计算机学院，呼和浩特 010021

出版日期:2017-05-01 发布日期:2017-05-04

Using Topic Content Ranking for Pseudo Relevance Feedback

YAN Rong+, GAO Guanglai

College of Computer Science, Inner Mongolia University, Hohhot 010021, China

Online:2017-05-01 Published:2017-05-04

摘要/Abstract

摘要： 传统的伪相关反馈（pseudo relevance feedback，PRF）方法，将文档作为基本抽取单元进行查询扩展，抽取粒度过大造成扩展源中噪音量的增加。研究利用主题分析技术来减轻扩展源的低质量现象。通过获取隐藏在伪相关文档集（pseudo-relevant set）各文档内容中的语义信息，并从中提取与用户查询相关的抽象主题内容作为基本抽取单元用于查询扩展。在NTCIR 8中文语料上，与传统PRF方法和基于主题模型的PRF方法相比较，实验结果表明该方法可以抽取出更符合用户查询的扩展词。此外，结果显示从更小的主题内容粒度出发进行查询扩展，可以有效提升检索性能。

关键词: 主题模型, 主题内容, 伪相关反馈

Abstract: Traditional pseudo relevance feedback (PRF) algorithms use the document as a unit to extract words for query expansion, which will increase the noise of expansion source due to the larger extraction unit. This paper exploits the topic analysis techniques so as to alleviate the low quality of expansion source condition. Obtain semantic information hidden in the content of each document of pseudo-relevant set, and extract the abstract topic content information according to the relevance of the user query, which is described as a basic extraction unit to be used for query expansion. Compared with the traditional PRF algorithms and the PRF based on topic model algorithm, the experimental results on NTCIR 8 dataset show that the scheme in this paper can effectively extract more appropriate expansion terms. In addition, the results also show that the scheme in this paper has a positive impact to improve the retrieval performance on a smaller topic content granularity level.

Key words: topic model, topic content, pseudo relevance feedback (PRF)

闫蓉，高光来. 利用主题内容排序的伪相关反馈[J]. 计算机科学与探索, 2017, 11(5): 814-821.

YAN Rong, GAO Guanglai. Using Topic Content Ranking for Pseudo Relevance Feedback[J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(5): 814-821.

[1]	王世杰，周丽华，孔兵，周俊华. 基于LDA-DeepHawkes模型的信息级联预测[J]. 计算机科学与探索, 2020, 14(3): 410-425.
[2]	刘少钦，唐爽，赵俊峰，王亚沙，卓琳. 基于扩展主题模型的异常医疗处方检测方法[J]. 计算机科学与探索, 2020, 14(1): 30-39.
[3]	黄畅，郭文忠，郭昆. 面向微博热点话题发现的改进BBTM模型研究[J]. 计算机科学与探索, 2019, 13(7): 1102-1113.
[4]	周凯文，杨智慧，马会心，何震瀛，荆一楠，王晓阳. 面向特定划分的主题模型的设计与实现[J]. 计算机科学与探索, 2018, 12(7): 1036-1046.
[5]	沈桂兰，贾彩燕，于剑，杨小平. 适用于大规模信息网络的语义社区发现方法[J]. 计算机科学与探索, 2017, 11(4): 565-576.
[6]	韩俊明，王炜，李彤，何云. 面向开源软件的演化确认方法[J]. 计算机科学与探索, 2017, 11(4): 539-555.
[7]	韩俊明，王炜，李彤，何云. 演化软件的特征定位方法[J]. 计算机科学与探索, 2016, 10(9): 1201-1210.
[8]	李天辰，殷建平. 基于主题聚类的情感极性判别方法[J]. 计算机科学与探索, 2016, 10(7): 989-994.
[9]	刘娜，路莹，唐晓君，李明霞. 基于LDA重要主题的多文档自动摘要算法[J]. 计算机科学与探索, 2015, 9(2): 242-248.
[10]	徐彬，杨丹，张昱，李封，高克宁. 基于学习者行为特征的MOOCs学习伙伴推荐[J]. 计算机科学与探索, 2015, 9(1): 71-79.
[11]	吴蕾，张文生，王珏. 异构信息网络数据上的融合概率图模型[J]. 计算机科学与探索, 2014, 8(6): 712-718.
[12]	江雨燕，李平，王清，李常训. 融合DSTM和USTM方法的主题模型[J]. 计算机科学与探索, 2014, 8(5): 630-639.
[13]	徐彬，杨丹，张昱，李封，高克宁. 面向微博用户标签推荐的关系约束主题模型[J]. 计算机科学与探索, 2014, 8(3): 288-295.
[14]	张倩，瞿有利. 用于网络评论分析的主题-对立情感挖掘模型[J]. 计算机科学与探索, 2013, 7(7): 620-629.

利用主题内容排序的伪相关反馈

Using Topic Content Ranking for Pseudo Relevance Feedback

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 14

编辑推荐

Metrics