Collaborative Filtering Algorithm Combining Similarity Measure and Pre-filtering Mode

doi:10.3778/j.issn.1673-9418.2104065

Abstract

Abstract: In the neighborhood-based collaborative filtering algorithms, the calculation of user (item) similarity has an important effect on the result of prediction and recommendation. Most of traditional similarity measures only consider co-rated items, which can get the similarity results quickly. However, the sparse datasets will lower the recommendation accuracy. At present, most of the advanced collaborative filtering algorithms improve the recommendation accuracy by designing complex similarity measures, but often ignore the computation cost in the model. A collaborative filtering algorithm combining similarity measure and pre-filtering mode is proposed in order to generate better recommendation in lower computation time. Firstly, the optimized similarity is obtained by defining the relative rating difference and enumerating the qualitative conditions that should be satisfied. At the same time, the rating preference based on the improved information entropy and the quantity information of user global ratings are considered as the two weight factors, which better distinguishes the differences between users and alleviates the problem of inaccurate similarity calculation under sparse data. Secondly, due to the inherent characteristics of the similarity measure and rating prediction formula, a pre-filtering model is proposed to filter out a large number of unnecessary users and the corresponding ratings, so as to further improve the computational efficiency. Finally, the collaborative filtering algorithm combining similarity measure and pre-filtering mode is obta-ined. Experimental results on the benchmark datasets indicate that the collaborative filtering algorithm has better recommendation quality and higher time efficiency than other eight comparison algorithms.

Key words: collaborative filtering, recommendation algorithm, similarity measure, pre-filtering mode

摘要： 基于邻域的协同过滤算法中，用户（项目）相似度的计算对预测和推荐结果有重要影响。传统相似度基于共同评分项目，能快速计算出相似度结果，但在稀疏数据下，推荐准确性较低。目前大多数改进的协同过滤算法通过制定较复杂的相似度公式，提高推荐准确性，但往往忽略了模型的计算时间。为达到在降低时间成本的同时执行有效的推荐，提出融合相似度和预筛选模式的协同过滤算法。首先在相似度模型中定义相对评分差异，并列举应满足的定性条件得到优化的相似度，同时考虑基于信息熵改进的评分偏好和用户全局评分的数量信息作为权重因子，更好地区分用户间差异，缓解稀疏数据下相似度计算不准确的问题。其次通过分析相似度模型和评分预测公式的隐式约束，提出预筛选模式，过滤掉大量无效的用户及对应的评分数据，进一步提高计算效率。最终通过融合相似度和预筛选模式得到协同过滤算法。在基准数据集上的实验表明，与其余8种算法相比，提出的算法具有良好的推荐质量和较高的时间效率。

关键词: 协同过滤, 推荐算法, 相似度, 预筛选模式

ZHAO Wentao, TIAN Huanhuan, FENG Tingting, CUI Ziheng. Collaborative Filtering Algorithm Combining Similarity Measure and Pre-filtering Mode[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(1): 217-227.

赵文涛, 田欢欢, 冯婷婷, 崔自恒. 融合相似度和预筛选模式的协同过滤算法[J]. 计算机科学与探索, 2023, 17(1): 217-227.

References

[1] SAXENA D, LAMEST M. Information overload and coping strategies in the big data context: evidence from the hospit-ality sector[J]. Journal of Information Science, 2018, 44(3): 287-297.
[2] SHI Y, LARSON M, HANJALIC A. Collaborative filtering beyond the user-item matrix: a survey of the state of the art and future challenges[J]. ACM Computing Surveys, 2014, 47(1): 1-45.
[3] KRISHNAPPA D K, ZINK M, GRIWODZ C, et al. Cache-centric video recommendation: an approach to improve the efficiency of You Tube caches[J]. ACM Transactions on Multimedia Computing Communications and Applications, 2015, 11(4): 1-20.
[4] YU S, YANG M, QU Q, et al. Contextual-boosted deep neural collaborative filtering model for interpretable recommenda-tion[J]. Expert Systems with Applications, 2019, 136: 365-375.
[5] JAIN A, NAGAR S, SINGH P K, et al. EMUCF: enhanced multistage user-based collaborative filtering through nonli-near similarity for recommendation systems[J]. Expert Sys-tems with Applications, 2020, 161: 113724.
[6] CANDILLIER L, MEYER F, BOULLé M. Comparing state-of-the-art collaborative filtering systems[C]//LNCS 4571:Proceedings of the 5th International Conference on Machine Learning and Data Mining in Pattern Recognition, Leipzig,Jul 18-20, 2007. Berlin, Heidelberg: Springer, 2007: 548-562.
[7] DESROSIERS C, KARYPIS G. A comprehensive survey of neighborhood-based recommendation methods[M]//RICCI F, ROKACH L, SHAPIRA B, eds. Recommender Systems Handbook. Berlin, Heidelberg: Springer, 2011.
[8] SANCHEZ J L, SERRADILLA F, MARTINEZ E, et al. Cho-ice of metrics used in collaborative filtering and their impact on recommender systems[C]//Proceedings of the 2008 2nd IEEE International Conference on Digital Eco-systems and Technologies, Phitsanuloke, Feb 26-29, 2008. Piscataway: IEEE, 2008: 432-436.
[9] KOSUB S. A note on the triangle inequality for the Jaccard distance[J]. Pattern Recognition Letters, 2019, 120: 36-38.
[10] BOBADILLA J, SERRADILLA F, BERNAL J. A new collaborative filtering metric that improves the behavior of recommender systems[J]. Knowledge-Based Systems, 2010, 23(6): 520-528.
[11] JAMALI M, ESTER M. TrustWalker: a random walk model for combining trust-based and item-based recommendation[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,Paris, Jun 28-Jul 1, 2009. New York: ACM, 2009: 397-406.
[12] NGUYEN V D, SRIBOONCHITTA S, HUYNH V N. Using community preference for overcoming sparsity and cold-start problems in collaborative filtering system offering soft ratings[J]. Electronic Commerce Research and Applica-tions, 2017, 26: 101-108.
[13] ZHANG F G, QI S M, LIU Q H, et al. Alleviating the data sparsity problem of recommender systems by clustering nodes in bipartite networks[J]. Expert Systems with Applications, 2020, 149: 113346.
[14] WEI J, HE J, CHEN K, et al. Collaborative filtering and deep learning based recommendation system for cold start items[J]. Expert Systems with Applications, 2017, 69: 29-39.
[15] CAMACHO L A G, ALVES-SOUZA S N. Social network data to alleviate cold-start in recommender system: a syste-matic review[J]. Information Processing & Management, 2018, 54(4): 529-544.
[16] LIU H F, HU Z, MIAN A, et al. A new user similarity model to improve the accuracy of collaborative filtering[J]. Knowledge-Based Systems, 2014, 56: 156-166.
[17] PATRA B K, LAUNONEN R, OLLIKAINEN V, et al. A new similarity measure using Bhattacharyya coefficient for collaborative filtering in sparse data[J]. Knowledge-Based Systems, 2015, 82: 163-177.
[18] WANG Y, DENG J Z, GAO J, et al. A hybrid user similarity model for collaborative filtering[J]. Information Sciences, 2017, 418: 102-118.
[19] MOHAMMADPOUR T, BIDGOLI A M, ENAYATIFAR R, et al. Efficient clustering in collaborative filtering recommender system: hybrid method based on genetic algorithm and gravitational emulation local search algorithm[J]. Genomics, 2019, 111(6): 1902-1912.
[20] 李婷, 张瑞芳, 郭克华. 面向个性化网站的增量协同过滤推荐方法[J]. 计算机工程与应用, 2019, 55(4): 225-232.
LI T, ZHANG R F, GUO K H. Incremental collaborative filtering recommendation method for personalized web-sites[J]. Computer Engineering and Applications, 2019, 55(4): 225-232.
[21] GAZDAR A, HIDRI L. A new similarity measure for colla-borative filtering based recommender systems[J]. Knowledge- Based Systems, 2020, 188: 105058.
[22] 任永功, 张云鹏, 张志鹏. 基于粗糙集规则提取的协同过滤推荐算法[J]. 通信学报, 2020, 41(1): 76-83.
REN Y G, ZHANG Y P, ZHANG Z P. Collaborative filtering recommendation algorithm based on rough set rule extrac-tion[J]. Journal on Communications, 2020, 41(1): 76-83.
[23] 吴彦文, 李斌, 孙晨辉, 等. 基于迁移学习的领域自适应推荐方法研究[J]. 计算机工程与应用, 2019, 55(13): 59-65.
WU Y W, LI B, SUN C H, et al. Research on domain adap-tive recommendation methods based on transfer learning[J]. Computer Engineering and Applications, 2019, 55(13): 59-65.
[24] FU M, QU H, MOGES D, et al. Attention based collabor-ative filtering[J]. Neurocomputing, 2018, 311: 88-98.
[25] POLATIDIS N, GEORGIADIS C K. A dynamic multi-level collaborative filtering method for improved recommenda-tions[J]. Computer Standards & Interfaces, 2017, 51: 14-21.
[26] BAG S, KUMAR S, TIWARI M. An efficient recom-mendation generation using relevant Jaccard similarity[J]. Information Sciences, 2019, 483: 53-64.
[27] ZHANG R H, HU Z H. Collaborative filtering recom-mendation algorithm based on bee colony K-means clustering model[J]. Microprocessors and Microsystems, 2020: 103424.
[28] LIU C L, WU X W. Fast recommendation on latent col-laborative relations[J]. Knowledge-Based Systems, 2016, 109: 25-34.
[29] CHAE D K, LEE S C, LEE S Y, et al. On identifying k-nearest neighbors in neighborhood models for efficient and effective collaborative filtering[J]. Neurocomputing, 2018, 278: 134-143.
[30] WANG D W, YIH Y, VENTRESCA M. Improving neighbor-based collaborative filtering by using a hybrid similarity measurement[J]. Expert Systems with Applications, 2020, 160: 113651.
[31] ZHANG N, WANG Z Y. Collaborative filtering recom-mendation algorithm based on hybrid similarity[C]//Pro-ceedings of the 2017 International Conference on Intelligent Computing, Communication and Devices, Dalian, Dec 25-27, 2017. Piscataway: IEEE, 2017: 617-625.
[32] MA X, LU H W, GAN Z B, et al. An explicit trust and distrust clustering based collaborative filtering recommen-dation approach[J]. Electronic Commerce Research and Applications, 2017, 25: 29-39.
[33] CREMONESI P, KOREN Y, TURRIN R. Performance of recommender algorithms on top-n recommendation tasks[C]// Proceedings of the 2010 ACM Conference on Recomm-ender Systems, Barcelona, Sep 26-30, 2010. New York: ACM, 2010: 39-46.