面向稀疏数据的协同过滤用户相似度计算研究

doi:10.3778/j.issn.1673-9418.2011062

计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (5): 1043-1052.DOI: 10.3778/j.issn.1673-9418.2011062

面向稀疏数据的协同过滤用户相似度计算研究

武森, 董雅贤, 魏桂英(), 高晓楠

北京科技大学经济管理学院,北京 100083

收稿日期:2020-11-23 修回日期:2021-03-15 出版日期:2022-05-01 发布日期:2022-05-19
通讯作者: + E-mail: weigy@manage.ustb.edu.cn
作者简介:武森（1971—）,女,辽宁开原人,博士,教授,主要研究方向为数据挖掘、个性化推荐等。
董雅贤（1996—）,女,天津人,硕士研究生,主要研究方向为数据处理与分析、个性化推荐等。
魏桂英（1969—）,女,河北承德人,博士,副教授,主要研究方向为数据挖掘、个性化推荐等。
高晓楠（1996—）,女,山西长治人,博士研究生,主要研究方向为数据处理与分析、个性化推荐等。
基金资助:
国家自然科学基金(71971025)

Research on User Similarity Calculation of Collaborative Filtering for Sparse Data

WU Sen, DONG Yaxian, WEI Guiying(), GAO Xiaonan

School of Economics and Management, University of Science and Technology Beijing, Beijing 100083, China

Received:2020-11-23 Revised:2021-03-15 Online:2022-05-01 Published:2022-05-19
About author:WU Sen, born in 1971, Ph.D., professor. Her research interests include data mining, personalized recommendation, etc.
DONG Yaxian, born in 1996, M.S. candidate. Her research interests include data processing and analysis, personalized recommendation, etc.
WEI Guiying, born in 1969, Ph.D., associate professor. Her research interests include data mining, personalized recommendation, etc.
GAO Xiaonan, born in 1996, Ph.D. candidate. Her research interests include data processing and analysis, personalized recommendation, etc.
Supported by:
National Natural Science Foundation of China(71971025)

摘要/Abstract

摘要：

基于用户的协同过滤通过获取最近邻的偏好实现对目标用户偏好的预测推荐,相似度计算为其核心步骤。传统数值相似度计算依赖于用户共同评分项的评分数值,用户-项目评分矩阵稀疏程度的加剧导致数值相似度计算准确性降低,难以为目标用户选取可靠的最近邻,影响推荐效果;现有结构相似度大多利用用户共同评分项占比度量,计算简单,受数据稀疏影响较小但区分度低。针对上述协同过滤任务中数据稀疏带来的相似度计算问题,提出一种稀疏余弦相似度。首先定义新的结构相似度——稀疏集合相似度,将用户区分为高相关用户与低相关用户,并进一步针对不同类型用户设计差异化的数值相似度计算方式,以缓解传统数值相似度在面临数据稀疏时的不足,最终综合数值相似度与结构相似度形成稀疏余弦相似度。实验结果表明,与七种相似度计算方法相比,稀疏余弦相似度解决了传统数值相似度受数据稀疏影响严重和结构相似度计算结果区分度低的问题,可更准确计算用户相似度,提升推荐效果。

关键词: 相似度计算, 协同过滤, 稀疏数据, 推荐系统

Abstract:

User-based collaborative filtering achieves recommendation for target users based on the preferences of their nearest neighbors, in which how to calculate user similarity is critical. The traditional rating similarity calculation relies on the scores of common scoring items. With the intensification of the sparsity of user-item scoring matrix, traditional rating similarity calculation is difficult to accurately measure the similarity between users. Along this line, traditional rating similarity calculation is difficult in selecting reliable nearest neighbors for the target user, which affects the final recommendation performance. Besides, structural similarity is another commonly used similarity calculation method in recommendation task, which is mostly measured by the proportion of users’ common scoring items. This kind of method is easy to calculate and less affected by data sparseness. However, its outputs are usually close, leading to the result that different user-pairs cannot be distinguished obviously. To solve the similarity calculation difficulty for collaborative filtering caused by data sparseness, a sparse cosine similarity is proposed in this paper. Firstly, this paper formulates a new structural similarity, sparse set simil-arity to differentiate users into two groups, high-correlation users and low-correlation users. Then, this paper deve-lops different rating similarity calculation methods for different kinds of users, which can eliminate the misleading produced by traditional rating similarity when the data is sparse. Finally, the sparse cosine similarity is constructed by combining the raised rating similarity and structural similarity. Experimental results show that compared with seven similarity calculation methods, the presented sparse cosine similarity can yield more accurate user similarity and improve the performance of recommendation task, overcoming the limitations that traditional rating methods are affected by data sparseness severely and the results produced by structural methods are not distinct significantly.

中图分类号:

TP391

武森, 董雅贤, 魏桂英, 高晓楠. 面向稀疏数据的协同过滤用户相似度计算研究[J]. 计算机科学与探索, 2022, 16(5): 1043-1052.

WU Sen, DONG Yaxian, WEI Guiying, GAO Xiaonan. Research on User Similarity Calculation of Collaborative Filtering for Sparse Data[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 1043-1052.

图/表 12

表1 用户-项目评分矩阵算例

Table 1 Example of user-item rating matrix

用户	I₁	I₂	I₃	I₄	I₅	I₆	I₇	I₈	I₉	I₁₀
u₁	5	3	5	4	—	—	—	—	—	—
u₂	3	5	—	—	—	2	—	—	—	—
u₃	3	5	5	2	—	2	—	—	—	—
u₄	2	—	—	—	—	1	2	1	1	1
u₅	4	3	—	2	—	—	—	1	—	—

表2 部分用户评分分析

Table 2 Analysis of partial user ratings

用户对	共同评分项数	共同评分项	共同评分项占总评分项比重	共同评分项评分分析
{u₂,u₃}	3	I₁、I₂、I₆	3/5	评分均相等
{u₂,u₅}	2	I₁、I₂	2/5	u₂更喜欢I₂,u₅更喜欢I₁
{u₂,u₁}	2	I₁、I₂	2/5	u₂更喜欢I₂,u₁更喜欢I₁
{u₄,u₁}	1	I₁	1/9	评分差距较大
{u₄,u₃}	2	I₁、I₆	2/9	评分差距较小

表3 部分用户相似度计算结果

Table 3 Similarity measure of partial users

相似度	{u₂,u₃}	{u₂,u₅}	{u₂,u₁}	{u₄,u₁}	{u₄,u₃}
PCC	0.643	0.718	0.362	-0.224	-0.383
COS	0.753	0.800	0.562	0.333	0.282
ACOS	0.999	0.124	-0.942	1.000	0.184

表4 数据集描述

Table 4 Description of datasets

数据集	用户M	项目N	评分R	稀疏度 $k$ /%
MovieLens-100K	943	1 682	100 000	93.7
MovieLens-latest-small	610	9 742	100 836	98.3
FilmTrust	1 508	2 071	35 497	98.9

表4 数据集描述

Table 4 Description of datasets

数据集	用户M	项目N	评分R	稀疏度 $k$ /%
MovieLens-100K	943	1 682	100 000	93.7
MovieLens-latest-small	610	9 742	100 836	98.3
FilmTrust	1 508	2 071	35 497	98.9

图1 不同相似度MAE对比

Fig.1 Comparison of MAE with different similarities

图2 不同相似度RMSE对比

Fig.2 Comparison of RMSE with different similarities

表5 对比方法描述

Table 5 Description of comparison methods

相似度	类型	是否适用于稀疏数据	作用
COS	数值相似度	否	对比方法^[4]
ACOS	数值相似度	否	对比方法^[5]
PCC	数值相似度	否	对比方法^[4]
Jaccard	结构相似度	否	对比方法^[8]
JMSD	综合相似度	否	对比方法^[8]
BCF	综合相似度	是	对比方法^[26]
RJMSD	综合相似度	是	对比方法^[29]
SCS	综合相似度	是	提出方法

表6 SCS与NMF推荐效果对比

Table 6 Recommended performance comparison of SCS and NMF

指标	方法	MovieLens-100K		MovieLens-latest-small		FilmTrust
指标	方法	最优值	均值	最优值	均值	最优值	均值
MAE	SCS	0.689	0.695	0.652	0.657	0.603	0.617
MAE	NMF	0.749	0.758	0.697	0.707	0.653	0.656
RMSE	SCS	0.970	0.981	0.883	0.889	0.841	0.860
RMSE	NMF	0.954	0.965	0.908	0.903	0.843	0.857

图3 β的变化对稀疏余弦相似度MAE的影响

Fig.3 Influence of β on sparse cosine similarity MAE

图4 β的变化对稀疏余弦相似度RMSE的影响

Fig.4 Influence of β on sparse cosine similarity RMSE

图5 b的变化对稀疏余弦相似度MAE的影响

Fig.5 Influence of b on sparse cosine similarity MAE

图6 b的变化对稀疏余弦相似度RMSE的影响

Fig.6 Influence of b on sparse cosine similarity RMSE

参考文献 31

[1]	DESROSIERS C, KARYPIS G. Recommender systems hand- book: a comprehensive survey of neigborhood-based reco-mmendation methods[M]. Berlin, Heidelberg: Springer, 2011.
[2]	GAZDAR A. A new similarity measure for collaborative filtering based recommender systems[J]. Knowledge-Based Systems, 2020, 188: 105058. DOI URL
[3]	FENG C, LIANG J, SONG P, et al. A fusion collaborative filtering method for sparse data in recommender systems[J]. Information Sciences, 2020, 521: 365-379. DOI URL
[4]	SU X, KHOSHGOFTAAR T M. A survey of collaborative filtering techniques[J]. Advances in Artificial Intelligence, 2009, 12: 421425.
[5]	AHN H J. A new similarity measure for collaborative filte-ring to alleviate the new user cold-starting problem[J]. Infor-mation Sciences, 2008, 178(1): 37-51.
[6]	SURYAKANT, MAHARA T. A new similarity measure based on mean measure of divergence for collaborative filtering in sparse environment[J]. Procedia Computer Science, 2016, 89: 450-456. DOI URL
[7]	WANG D, YIH Y, VENTRESCA M. Improving neighbor-based collaborative filtering by using a hybrid similarity measurement[J]. Expert Systems with Applications, 2020, 160: 113651. DOI URL
[8]	BOBADILLA J, SERRADILLA F, BERNAL J. A new coll-aborative filtering metric that improves the behavior of re-commender systems[J]. Knowledge-Based Systems, 2010, 23(6): 520-528. DOI URL
[9]	BREESE J S, HECKERMAN D, KADIE C. Empirical anal-ysis of predictive algorithms for collaborative filtering[J]. Uncertainty in Artificial Intelligence, 2013, 98(7): 43-52.
[10]	HAN S, CHEE S, HAN J, et al. RecTree: an efficient coll-aborative filtering method[C]// LNCS 2114:Proceedings of the 3rd International Conference on Data Warehousing and Knowledge Discovery, Munich, Sep 5-7, 2001. Berlin, Hei-delberg: Springer, 2001: 141-151.
[11]	SARWAR B M, KARYPIS G, KONSTAN J A, et al. App-lication of dimensionality reduction in recommender sys-tem-a case study[C]// Proceedings of the ACM WebKDD Web Mining for E-Commerce Workshop, Boston, Aug 1,2000. New York: ACM, 2000: 82-90.
[12]	MNIH A, SALAKHUTDINOV R R. Probabilistic matrix factorization[C]// Advances in Neural Information Processing Systems 20: Proceedings of the 21st Annual Conference on Neural Information Processing Systems, Vancouver, Dec 3-6, 2007. Red Hook: Curran Associates, 2008: 1257-1264.
[13]	BERRY M W, BROWNE M, LANGVILLE A N, et al. Al-gorithms and applications for approximate nonnegative matrix factorization[J]. Computational Statistics & Data Analysis, 2007, 52(1): 155-173. DOI URL
[14]	李乐, 章毓晋. 非负矩阵分解算法综述[J]. 电子学报, 2008(4): 737-743.
	LI L, ZHANG Y J. A survey on algorithms of non-negative matrix factorization[J]. Acta Electronica Sinica, 2008(4):737-743.
[15]	ZHANG F, QI S, LIU Q, et al. Alleviating the data sparsity problem of recommender systems by clustering nodes in bipartite networks[J]. Expert Systems with Applications, 2020, 149: 113346. DOI URL
[16]	WANG Y, WANG P Y, LIU Z, et al. A new item similarity based on α-divergence for collaborative filtering in sparse data[J]. Expert Systems with Applications, 2020, 166: 114074. DOI URL
[17]	POLATIDIS N, GEORGIADIS C K. A multi-level collabo-rative filtering method that improves recommendations[J]. Expert Systems with Applications, 2016, 48: 100-110. DOI URL
[18]	RIYAHI M, SOHRABI M K. Providing effective recommen-dations in discussion groups using a new hybrid recom-mender system based on implicit ratings and semantic simi-larity[J]. Electronic Commerce Research and Applications, 2020, 40: 100938. DOI URL
[19]	YU S, YANG M, QU Q. Contextual-boosted deep neural co-llaborative filtering model for interpretable recommendation[J]. Expert Systems with Applications, 2019, 136: 365-375. DOI URL
[20]	荣辉桂, 火生旭, 胡春华, 等. 基于用户相似度的协同过滤推荐算法[J]. 通信学报, 2014, 35(2): 16-24.
	RONG H G, HUO S X, HU C H, et al. User similarity-based collaborative filtering recommendation algorithm[J]. Journal on Communications, 2014, 35(2): 16-24.
[21]	陈洁敏, 李建国, 汤非易, 等. 融合“用户-项目-用户兴趣标签图”的协同好友推荐算法[J]. 计算机科学与探索, 2018, 12(1): 92-100.
	CHEN J M, LI J G, TANG F Y, et al. Combining user-item-tag tripartite graph and users’ personal interests for friends recommendation[J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(1): 92-100.
[22]	AIOLLI F. Efficient top-n recommendation for very large scale binary rated datasets[C]// Proceedings of the 7th ACM Conference on Recommender Systems, Hong Kong, China,Oct 12-16, 2013. New York: ACM, 2013: 273-280.
[23]	JIANG S, FANG S C, AN Q, et al. A sub-one quasi-norm-based similarity measure for collaborative filtering in recom-mender systems[J]. Information Sciences, 2019, 487: 142-155. DOI URL
[24]	LIU H, HU Z, MIAN A, et al. A new user similarity model to improve the accuracy of collaborative filtering[J]. Knowledge-Based Systems, 2014, 56: 156-166. DOI URL
[25]	WANG Y, DENG J, GAO J, et al. A hybrid user similarity model for collaborative filtering[J]. Information Sciences, 2017, 418: 102-118.
[26]	PATRA B K, LAUNONEN R, OLLIKAINEN V, et al. A new similarity measure using Bhattacharyya coefficient for collaborative filtering in sparse data[J]. Knowledge Based Systems, 2015, 82: 163-177. DOI URL
[27]	BAG S, KUMAR S, AWASTHI A, et al. A noise correction-based approach to support a recommender system in a highly sparse rating environment[J]. Decision Support Systems, 2019, 118: 46-57. DOI URL
[28]	MU Y, XIAO N, TANG R, et al. An efficient similarity measure for collaborative filtering[J]. Procedia Computer Science, 2019, 147: 416-421. DOI URL
[29]	BAG S, KUMAR S K, TIWARI M K. An efficient recom-mendation generation using relevant Jaccard similarity[J]. Information Sciences, 2019, 483: 53-64. DOI URL
[30]	MAXWELL H F, JOSEPH A K. The MovieLens datasets: history and context[J]. ACM Transactions on Interactive Inte-lligent Systems, 2016, 5(4): 19.
[31]	GUO G, ZHANG J, YORKE S N. A novel bayesian simi-larity measure for recommender systems[C]// Proceedings of the 23rd International Joint Conference on Artificial Inte-lligence, Beijing, Aug 3-9, 2013. Menlo Park: AAAI, 2013: 2619-2625.

编辑推荐 0

Metrics

阅读次数

全文

133

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	13	0	0	120

来源	本网站	其他网站

次数	123	10
比例	92%	8%

摘要

281

最新录用	在线预览	正式出版

0	0	281

	来源	本网站

	次数	281
	比例	100%

面向稀疏数据的协同过滤用户相似度计算研究

Research on User Similarity Calculation of Collaborative Filtering for Sparse Data

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 31

相关文章 15

编辑推荐 0

Metrics

[1]	王雪纯, 吕晟凯, 吴浩, 何鹏, 曾诚. 多网络混合嵌入学习的服务推荐方法研究[J]. 计算机科学与探索, 2022, 16(7): 1529-1542.
[2]	陈江美, 张文德. 基于位置社交网络的兴趣点推荐系统研究综述[J]. 计算机科学与探索, 2022, 16(7): 1462-1478.
[3]	郭晓旺, 夏鸿斌, 刘渊. 融合知识图谱与图卷积网络的混合推荐模型[J]. 计算机科学与探索, 2022, 16(6): 1343-1353.
[4]	张全贵, 胡嘉燕, 王丽. 耦合用户公共特征的单类协同过滤推荐算法[J]. 计算机科学与探索, 2022, 16(3): 637-648.
[5]	李想, 杨兴耀, 于炯, 钱育蓉, 郑捷. 基于知识图谱卷积网络的双端推荐算法[J]. 计算机科学与探索, 2022, 16(1): 176-184.
[6]	蔡明昕, 孙晶, 王斌. 多角度语义轨迹相似度计算模型[J]. 计算机科学与探索, 2021, 15(9): 1632-1640.
[7]	武家伟, 孙艳春. 融合知识图谱和深度学习方法的问诊推荐系统[J]. 计算机科学与探索, 2021, 15(8): 1432-1440.
[8]	高仰, 刘渊. 融合知识图谱和短期偏好的推荐算法[J]. 计算机科学与探索, 2021, 15(6): 1133-1144.
[9]	邢长征，郭亚兰，张全贵，赵宏宝. 融合短文本层级注意力和时间信息的推荐方法[J]. 计算机科学与探索, 2021, 15(11): 2222-2232.
[10]	邢长征，赵宏宝，张全贵，郭亚兰. 融合评论文本层级注意力和外积的推荐方法[J]. 计算机科学与探索, 2020, 14(6): 947-957.
[11]	李广丽，滑瑾，袁天，朱涛，邬任重，姬东鸿，张红斌. 基于用户偏好挖掘生成对抗网络的推荐系统[J]. 计算机科学与探索, 2020, 14(5): 803-814.
[12]	王玮皓，陈松灿. 双曲因子分解机[J]. 计算机科学与探索, 2020, 14(4): 590-597.
[13]	刘忠慧，邹璐，杨梅，闵帆. 启发式概念构造的组推荐方法[J]. 计算机科学与探索, 2020, 14(4): 703-711.
[14]	王绍卿，李鑫鑫，孙福振，方春. 个性化新闻推荐技术研究综述[J]. 计算机科学与探索, 2020, 14(1): 18-29.
[15]	李幸幸，刘华锋，景丽萍. 混合秩矩阵分解模型[J]. 计算机科学与探索, 2019, 13(7): 1114-1122.