计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (5): 1043-1052.DOI: 10.3778/j.issn.1673-9418.2011062
收稿日期:
2020-11-23
修回日期:
2021-03-15
出版日期:
2022-05-01
发布日期:
2022-05-19
通讯作者:
+ E-mail: weigy@manage.ustb.edu.cn作者简介:
武森(1971—),女,辽宁开原人,博士,教授,主要研究方向为数据挖掘、个性化推荐等。基金资助:
WU Sen, DONG Yaxian, WEI Guiying(), GAO Xiaonan
Received:
2020-11-23
Revised:
2021-03-15
Online:
2022-05-01
Published:
2022-05-19
About author:
WU Sen, born in 1971, Ph.D., professor. Her research interests include data mining, personalized recommendation, etc.Supported by:
摘要:
基于用户的协同过滤通过获取最近邻的偏好实现对目标用户偏好的预测推荐,相似度计算为其核心步骤。传统数值相似度计算依赖于用户共同评分项的评分数值,用户-项目评分矩阵稀疏程度的加剧导致数值相似度计算准确性降低,难以为目标用户选取可靠的最近邻,影响推荐效果;现有结构相似度大多利用用户共同评分项占比度量,计算简单,受数据稀疏影响较小但区分度低。针对上述协同过滤任务中数据稀疏带来的相似度计算问题,提出一种稀疏余弦相似度。首先定义新的结构相似度——稀疏集合相似度,将用户区分为高相关用户与低相关用户,并进一步针对不同类型用户设计差异化的数值相似度计算方式,以缓解传统数值相似度在面临数据稀疏时的不足,最终综合数值相似度与结构相似度形成稀疏余弦相似度。实验结果表明,与七种相似度计算方法相比,稀疏余弦相似度解决了传统数值相似度受数据稀疏影响严重和结构相似度计算结果区分度低的问题,可更准确计算用户相似度,提升推荐效果。
中图分类号:
武森, 董雅贤, 魏桂英, 高晓楠. 面向稀疏数据的协同过滤用户相似度计算研究[J]. 计算机科学与探索, 2022, 16(5): 1043-1052.
WU Sen, DONG Yaxian, WEI Guiying, GAO Xiaonan. Research on User Similarity Calculation of Collaborative Filtering for Sparse Data[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 1043-1052.
用户 | I1 | I2 | I3 | I4 | I5 | I6 | I7 | I8 | I9 | I10 |
---|---|---|---|---|---|---|---|---|---|---|
u1 | 5 | 3 | 5 | 4 | — | — | — | — | — | — |
u2 | 3 | 5 | — | — | — | 2 | — | — | — | — |
u3 | 3 | 5 | 5 | 2 | — | 2 | — | — | — | — |
u4 | 2 | — | — | — | — | 1 | 2 | 1 | 1 | 1 |
u5 | 4 | 3 | — | 2 | — | — | — | 1 | — | — |
表1 用户-项目评分矩阵算例
Table 1 Example of user-item rating matrix
用户 | I1 | I2 | I3 | I4 | I5 | I6 | I7 | I8 | I9 | I10 |
---|---|---|---|---|---|---|---|---|---|---|
u1 | 5 | 3 | 5 | 4 | — | — | — | — | — | — |
u2 | 3 | 5 | — | — | — | 2 | — | — | — | — |
u3 | 3 | 5 | 5 | 2 | — | 2 | — | — | — | — |
u4 | 2 | — | — | — | — | 1 | 2 | 1 | 1 | 1 |
u5 | 4 | 3 | — | 2 | — | — | — | 1 | — | — |
用户对 | 共同评分项数 | 共同评分项 | 共同评分项占总评分项比重 | 共同评分项评分分析 |
---|---|---|---|---|
{u2,u3} | 3 | I1、I2、I6 | 3/5 | 评分均相等 |
{u2,u5} | 2 | I1、I2 | 2/5 | u2更喜欢I2,u5更喜欢I1 |
{u2,u1} | 2 | I1、I2 | 2/5 | u2更喜欢I2,u1更喜欢I1 |
{u4,u1} | 1 | I1 | 1/9 | 评分差距较大 |
{u4,u3} | 2 | I1、I6 | 2/9 | 评分差距较小 |
表2 部分用户评分分析
Table 2 Analysis of partial user ratings
用户对 | 共同评分项数 | 共同评分项 | 共同评分项占总评分项比重 | 共同评分项评分分析 |
---|---|---|---|---|
{u2,u3} | 3 | I1、I2、I6 | 3/5 | 评分均相等 |
{u2,u5} | 2 | I1、I2 | 2/5 | u2更喜欢I2,u5更喜欢I1 |
{u2,u1} | 2 | I1、I2 | 2/5 | u2更喜欢I2,u1更喜欢I1 |
{u4,u1} | 1 | I1 | 1/9 | 评分差距较大 |
{u4,u3} | 2 | I1、I6 | 2/9 | 评分差距较小 |
相似度 | {u2,u3} | {u2,u5} | {u2,u1} | {u4,u1} | {u4,u3} |
---|---|---|---|---|---|
PCC | 0.643 | 0.718 | 0.362 | -0.224 | -0.383 |
COS | 0.753 | 0.800 | 0.562 | 0.333 | 0.282 |
ACOS | 0.999 | 0.124 | -0.942 | 1.000 | 0.184 |
表3 部分用户相似度计算结果
Table 3 Similarity measure of partial users
相似度 | {u2,u3} | {u2,u5} | {u2,u1} | {u4,u1} | {u4,u3} |
---|---|---|---|---|---|
PCC | 0.643 | 0.718 | 0.362 | -0.224 | -0.383 |
COS | 0.753 | 0.800 | 0.562 | 0.333 | 0.282 |
ACOS | 0.999 | 0.124 | -0.942 | 1.000 | 0.184 |
数据集 | 用户M | 项目N | 评分R | 稀疏度 |
---|---|---|---|---|
MovieLens-100K | 943 | 1 682 | 100 000 | 93.7 |
MovieLens-latest-small | 610 | 9 742 | 100 836 | 98.3 |
FilmTrust | 1 508 | 2 071 | 35 497 | 98.9 |
表4 数据集描述
Table 4 Description of datasets
数据集 | 用户M | 项目N | 评分R | 稀疏度 |
---|---|---|---|---|
MovieLens-100K | 943 | 1 682 | 100 000 | 93.7 |
MovieLens-latest-small | 610 | 9 742 | 100 836 | 98.3 |
FilmTrust | 1 508 | 2 071 | 35 497 | 98.9 |
相似度 | 类型 | 是否适用于稀疏数据 | 作用 |
---|---|---|---|
COS | 数值相似度 | 否 | 对比方法[ |
ACOS | 数值相似度 | 否 | 对比方法[ |
PCC | 数值相似度 | 否 | 对比方法[ |
Jaccard | 结构相似度 | 否 | 对比方法[ |
JMSD | 综合相似度 | 否 | 对比方法[ |
BCF | 综合相似度 | 是 | 对比方法[ |
RJMSD | 综合相似度 | 是 | 对比方法[ |
SCS | 综合相似度 | 是 | 提出方法 |
表5 对比方法描述
Table 5 Description of comparison methods
相似度 | 类型 | 是否适用于稀疏数据 | 作用 |
---|---|---|---|
COS | 数值相似度 | 否 | 对比方法[ |
ACOS | 数值相似度 | 否 | 对比方法[ |
PCC | 数值相似度 | 否 | 对比方法[ |
Jaccard | 结构相似度 | 否 | 对比方法[ |
JMSD | 综合相似度 | 否 | 对比方法[ |
BCF | 综合相似度 | 是 | 对比方法[ |
RJMSD | 综合相似度 | 是 | 对比方法[ |
SCS | 综合相似度 | 是 | 提出方法 |
指标 | 方法 | MovieLens-100K | MovieLens-latest-small | FilmTrust | |||
---|---|---|---|---|---|---|---|
最优值 | 均值 | 最优值 | 均值 | 最优值 | 均值 | ||
MAE | SCS | 0.689 | 0.695 | 0.652 | 0.657 | 0.603 | 0.617 |
NMF | 0.749 | 0.758 | 0.697 | 0.707 | 0.653 | 0.656 | |
RMSE | SCS | 0.970 | 0.981 | 0.883 | 0.889 | 0.841 | 0.860 |
NMF | 0.954 | 0.965 | 0.908 | 0.903 | 0.843 | 0.857 |
表6 SCS与NMF推荐效果对比
Table 6 Recommended performance comparison of SCS and NMF
指标 | 方法 | MovieLens-100K | MovieLens-latest-small | FilmTrust | |||
---|---|---|---|---|---|---|---|
最优值 | 均值 | 最优值 | 均值 | 最优值 | 均值 | ||
MAE | SCS | 0.689 | 0.695 | 0.652 | 0.657 | 0.603 | 0.617 |
NMF | 0.749 | 0.758 | 0.697 | 0.707 | 0.653 | 0.656 | |
RMSE | SCS | 0.970 | 0.981 | 0.883 | 0.889 | 0.841 | 0.860 |
NMF | 0.954 | 0.965 | 0.908 | 0.903 | 0.843 | 0.857 |
[1] | DESROSIERS C, KARYPIS G. Recommender systems hand- book: a comprehensive survey of neigborhood-based reco-mmendation methods[M]. Berlin, Heidelberg: Springer, 2011. |
[2] |
GAZDAR A. A new similarity measure for collaborative filtering based recommender systems[J]. Knowledge-Based Systems, 2020, 188: 105058.
DOI URL |
[3] |
FENG C, LIANG J, SONG P, et al. A fusion collaborative filtering method for sparse data in recommender systems[J]. Information Sciences, 2020, 521: 365-379.
DOI URL |
[4] | SU X, KHOSHGOFTAAR T M. A survey of collaborative filtering techniques[J]. Advances in Artificial Intelligence, 2009, 12: 421425. |
[5] | AHN H J. A new similarity measure for collaborative filte-ring to alleviate the new user cold-starting problem[J]. Infor-mation Sciences, 2008, 178(1): 37-51. |
[6] |
SURYAKANT, MAHARA T. A new similarity measure based on mean measure of divergence for collaborative filtering in sparse environment[J]. Procedia Computer Science, 2016, 89: 450-456.
DOI URL |
[7] |
WANG D, YIH Y, VENTRESCA M. Improving neighbor-based collaborative filtering by using a hybrid similarity measurement[J]. Expert Systems with Applications, 2020, 160: 113651.
DOI URL |
[8] |
BOBADILLA J, SERRADILLA F, BERNAL J. A new coll-aborative filtering metric that improves the behavior of re-commender systems[J]. Knowledge-Based Systems, 2010, 23(6): 520-528.
DOI URL |
[9] | BREESE J S, HECKERMAN D, KADIE C. Empirical anal-ysis of predictive algorithms for collaborative filtering[J]. Uncertainty in Artificial Intelligence, 2013, 98(7): 43-52. |
[10] | HAN S, CHEE S, HAN J, et al. RecTree: an efficient coll-aborative filtering method[C]// LNCS 2114:Proceedings of the 3rd International Conference on Data Warehousing and Knowledge Discovery, Munich, Sep 5-7, 2001. Berlin, Hei-delberg: Springer, 2001: 141-151. |
[11] | SARWAR B M, KARYPIS G, KONSTAN J A, et al. App-lication of dimensionality reduction in recommender sys-tem-a case study[C]// Proceedings of the ACM WebKDD Web Mining for E-Commerce Workshop, Boston, Aug 1,2000. New York: ACM, 2000: 82-90. |
[12] | MNIH A, SALAKHUTDINOV R R. Probabilistic matrix factorization[C]// Advances in Neural Information Processing Systems 20: Proceedings of the 21st Annual Conference on Neural Information Processing Systems, Vancouver, Dec 3-6, 2007. Red Hook: Curran Associates, 2008: 1257-1264. |
[13] |
BERRY M W, BROWNE M, LANGVILLE A N, et al. Al-gorithms and applications for approximate nonnegative matrix factorization[J]. Computational Statistics & Data Analysis, 2007, 52(1): 155-173.
DOI URL |
[14] | 李乐, 章毓晋. 非负矩阵分解算法综述[J]. 电子学报, 2008(4): 737-743. |
LI L, ZHANG Y J. A survey on algorithms of non-negative matrix factorization[J]. Acta Electronica Sinica, 2008(4):737-743. | |
[15] |
ZHANG F, QI S, LIU Q, et al. Alleviating the data sparsity problem of recommender systems by clustering nodes in bipartite networks[J]. Expert Systems with Applications, 2020, 149: 113346.
DOI URL |
[16] |
WANG Y, WANG P Y, LIU Z, et al. A new item similarity based on α-divergence for collaborative filtering in sparse data[J]. Expert Systems with Applications, 2020, 166: 114074.
DOI URL |
[17] |
POLATIDIS N, GEORGIADIS C K. A multi-level collabo-rative filtering method that improves recommendations[J]. Expert Systems with Applications, 2016, 48: 100-110.
DOI URL |
[18] |
RIYAHI M, SOHRABI M K. Providing effective recommen-dations in discussion groups using a new hybrid recom-mender system based on implicit ratings and semantic simi-larity[J]. Electronic Commerce Research and Applications, 2020, 40: 100938.
DOI URL |
[19] |
YU S, YANG M, QU Q. Contextual-boosted deep neural co-llaborative filtering model for interpretable recommendation[J]. Expert Systems with Applications, 2019, 136: 365-375.
DOI URL |
[20] | 荣辉桂, 火生旭, 胡春华, 等. 基于用户相似度的协同过滤推荐算法[J]. 通信学报, 2014, 35(2): 16-24. |
RONG H G, HUO S X, HU C H, et al. User similarity-based collaborative filtering recommendation algorithm[J]. Journal on Communications, 2014, 35(2): 16-24. | |
[21] | 陈洁敏, 李建国, 汤非易, 等. 融合“用户-项目-用户兴趣标签图”的协同好友推荐算法[J]. 计算机科学与探索, 2018, 12(1): 92-100. |
CHEN J M, LI J G, TANG F Y, et al. Combining user-item-tag tripartite graph and users’ personal interests for friends recommendation[J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(1): 92-100. | |
[22] | AIOLLI F. Efficient top-n recommendation for very large scale binary rated datasets[C]// Proceedings of the 7th ACM Conference on Recommender Systems, Hong Kong, China,Oct 12-16, 2013. New York: ACM, 2013: 273-280. |
[23] |
JIANG S, FANG S C, AN Q, et al. A sub-one quasi-norm-based similarity measure for collaborative filtering in recom-mender systems[J]. Information Sciences, 2019, 487: 142-155.
DOI URL |
[24] |
LIU H, HU Z, MIAN A, et al. A new user similarity model to improve the accuracy of collaborative filtering[J]. Knowledge-Based Systems, 2014, 56: 156-166.
DOI URL |
[25] | WANG Y, DENG J, GAO J, et al. A hybrid user similarity model for collaborative filtering[J]. Information Sciences, 2017, 418: 102-118. |
[26] |
PATRA B K, LAUNONEN R, OLLIKAINEN V, et al. A new similarity measure using Bhattacharyya coefficient for collaborative filtering in sparse data[J]. Knowledge Based Systems, 2015, 82: 163-177.
DOI URL |
[27] |
BAG S, KUMAR S, AWASTHI A, et al. A noise correction-based approach to support a recommender system in a highly sparse rating environment[J]. Decision Support Systems, 2019, 118: 46-57.
DOI URL |
[28] |
MU Y, XIAO N, TANG R, et al. An efficient similarity measure for collaborative filtering[J]. Procedia Computer Science, 2019, 147: 416-421.
DOI URL |
[29] |
BAG S, KUMAR S K, TIWARI M K. An efficient recom-mendation generation using relevant Jaccard similarity[J]. Information Sciences, 2019, 483: 53-64.
DOI URL |
[30] | MAXWELL H F, JOSEPH A K. The MovieLens datasets: history and context[J]. ACM Transactions on Interactive Inte-lligent Systems, 2016, 5(4): 19. |
[31] | GUO G, ZHANG J, YORKE S N. A novel bayesian simi-larity measure for recommender systems[C]// Proceedings of the 23rd International Joint Conference on Artificial Inte-lligence, Beijing, Aug 3-9, 2013. Menlo Park: AAAI, 2013: 2619-2625. |
[1] | 王雪纯, 吕晟凯, 吴浩, 何鹏, 曾诚. 多网络混合嵌入学习的服务推荐方法研究[J]. 计算机科学与探索, 2022, 16(7): 1529-1542. |
[2] | 陈江美, 张文德. 基于位置社交网络的兴趣点推荐系统研究综述[J]. 计算机科学与探索, 2022, 16(7): 1462-1478. |
[3] | 郭晓旺, 夏鸿斌, 刘渊. 融合知识图谱与图卷积网络的混合推荐模型[J]. 计算机科学与探索, 2022, 16(6): 1343-1353. |
[4] | 张全贵, 胡嘉燕, 王丽. 耦合用户公共特征的单类协同过滤推荐算法[J]. 计算机科学与探索, 2022, 16(3): 637-648. |
[5] | 李想, 杨兴耀, 于炯, 钱育蓉, 郑捷. 基于知识图谱卷积网络的双端推荐算法[J]. 计算机科学与探索, 2022, 16(1): 176-184. |
[6] | 蔡明昕, 孙晶, 王斌. 多角度语义轨迹相似度计算模型[J]. 计算机科学与探索, 2021, 15(9): 1632-1640. |
[7] | 武家伟, 孙艳春. 融合知识图谱和深度学习方法的问诊推荐系统[J]. 计算机科学与探索, 2021, 15(8): 1432-1440. |
[8] | 高仰, 刘渊. 融合知识图谱和短期偏好的推荐算法[J]. 计算机科学与探索, 2021, 15(6): 1133-1144. |
[9] | 邢长征,郭亚兰,张全贵,赵宏宝. 融合短文本层级注意力和时间信息的推荐方法[J]. 计算机科学与探索, 2021, 15(11): 2222-2232. |
[10] | 邢长征,赵宏宝,张全贵,郭亚兰. 融合评论文本层级注意力和外积的推荐方法[J]. 计算机科学与探索, 2020, 14(6): 947-957. |
[11] | 李广丽,滑瑾,袁天,朱涛,邬任重,姬东鸿,张红斌. 基于用户偏好挖掘生成对抗网络的推荐系统[J]. 计算机科学与探索, 2020, 14(5): 803-814. |
[12] | 王玮皓,陈松灿. 双曲因子分解机[J]. 计算机科学与探索, 2020, 14(4): 590-597. |
[13] | 刘忠慧,邹璐,杨梅,闵帆. 启发式概念构造的组推荐方法[J]. 计算机科学与探索, 2020, 14(4): 703-711. |
[14] | 王绍卿,李鑫鑫,孙福振,方春. 个性化新闻推荐技术研究综述[J]. 计算机科学与探索, 2020, 14(1): 18-29. |
[15] | 李幸幸,刘华锋,景丽萍. 混合秩矩阵分解模型[J]. 计算机科学与探索, 2019, 13(7): 1114-1122. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||