计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (1): 217-227.DOI: 10.3778/j.issn.1673-9418.2104065

• 人工智能·模式识别 • 上一篇    下一篇

融合相似度和预筛选模式的协同过滤算法

赵文涛,田欢欢,冯婷婷,崔自恒   

  1. 河南理工大学 计算机科学与技术学院,河南 焦作 454000
  • 出版日期:2023-01-01 发布日期:2023-01-01

Collaborative Filtering Algorithm Combining Similarity Measure and Pre-filtering Mode

ZHAO Wentao, TIAN Huanhuan, FENG Tingting, CUI Ziheng   

  1. College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, Henan 454000, China
  • Online:2023-01-01 Published:2023-01-01

摘要: 基于邻域的协同过滤算法中,用户(项目)相似度的计算对预测和推荐结果有重要影响。传统相似度基于共同评分项目,能快速计算出相似度结果,但在稀疏数据下,推荐准确性较低。目前大多数改进的协同过滤算法通过制定较复杂的相似度公式,提高推荐准确性,但往往忽略了模型的计算时间。为达到在降低时间成本的同时执行有效的推荐,提出融合相似度和预筛选模式的协同过滤算法。首先在相似度模型中定义相对评分差异,并列举应满足的定性条件得到优化的相似度,同时考虑基于信息熵改进的评分偏好和用户全局评分的数量信息作为权重因子,更好地区分用户间差异,缓解稀疏数据下相似度计算不准确的问题。其次通过分析相似度模型和评分预测公式的隐式约束,提出预筛选模式,过滤掉大量无效的用户及对应的评分数据,进一步提高计算效率。最终通过融合相似度和预筛选模式得到协同过滤算法。在基准数据集上的实验表明,与其余8种算法相比,提出的算法具有良好的推荐质量和较高的时间效率。

关键词: 协同过滤, 推荐算法, 相似度, 预筛选模式

Abstract: In the neighborhood-based collaborative filtering algorithms, the calculation of user (item) similarity has an important effect on the result of prediction and recommendation. Most of traditional similarity measures only consider co-rated items, which can get the similarity results quickly. However, the sparse datasets will lower the recommendation accuracy. At present, most of the advanced collaborative filtering algorithms improve the recommendation accuracy by designing complex similarity measures, but often ignore the computation cost in the model. A collaborative filtering algorithm combining similarity measure and pre-filtering mode is proposed in order to generate better recommendation in lower computation time. Firstly, the optimized similarity is obtained by defining the relative rating difference and enumerating the qualitative conditions that should be satisfied. At the same time, the rating preference based on the improved information entropy and the quantity information of user global ratings are considered as the two weight factors, which better distinguishes the differences between users and alleviates the problem of inaccurate similarity calculation under sparse data. Secondly, due to the inherent characteristics of the similarity measure and rating prediction formula, a pre-filtering model is proposed to filter out a large number of unnecessary users and the corresponding ratings, so as to further improve the computational efficiency. Finally, the collaborative filtering algorithm combining similarity measure and pre-filtering mode is obta-ined. Experimental results on the benchmark datasets indicate that the collaborative filtering algorithm has better recommendation quality and higher time efficiency than other eight comparison algorithms.

Key words: collaborative filtering, recommendation algorithm, similarity measure, pre-filtering mode