计算机科学与探索 ›› 2017, Vol. 11 ›› Issue (7): 1166-1174.DOI: 10.3778/j.issn.1673-9418.1605005

• 理论与算法 • 上一篇    下一篇

稀疏混合图随机跳跃Web对象多标签半监督分类

汪忠国1+,吴  敏2,谭芳芳3   

  1. 1. 安徽信息工程学院,安徽 芜湖 241000
    2. 中国科学技术大学 软件学院,合肥 230051
    3. 安徽信息工程学院 基础教学部,安徽 芜湖 241000
  • 出版日期:2017-07-01 发布日期:2017-07-07

Sparse Mixed Graph Random Jump Transition Policy for Web Object Multi-Label Classification

WANG Zhongguo1+, WU Min2, TAN Fangfang3   

  1. 1. Anhui Institute of Information Technology, Wuhu, Anhui 241000, China
    2. School of Software Engineering, University of Science and Technology of China, Hefei 230051, China
    3. Foundation Teaching Department, Anhui Institute of Information Technology, Wuhu, Anhui 241000, China
  • Online:2017-07-01 Published:2017-07-07

摘要: 针对Web对象的多标签分类的自动标注过程中,存在的标记数据耗时和不足导致分类性能不高的问题,提出了基于稀疏混合图随机跳跃变迁策略的Web对象多标签分类算法。首先,在构建Web对象亲和子图和标签相关子图基础上,通过权重自适应方式构建Web对象标签分类的混合图,实现半监督形式的自动标注,解决人工标注存在的耗时问题;其次,针对混合图求解问题,利用随机跳跃变迁策略实现混合图对象与预测标签间的概率分配,实现未标记的Web对象所属类别标签的概率估计,并获得其top-k最高相关性分数;最后,在UCI Web测试集和真实大数据上进行测试,结果显示所提算法的Rand指标要优于对比算法,验证了算法的有效性。

关键词: 大数据, 随机跳跃, Web对象, 标签分类, 自动标注

Abstract: In order to solve the problem of time consuming and insufficient for labeling data, which leads the low computational efficiency in multi-label classification of Web objects, this paper proposes a multi-label classification algorithm based on sparse mixed graph random jump transition strategy for Web object. Firstly, based on the construction of the Web object affinity graph and tag correlation, weight adaptive method is used to construct a hybrid graph of Web object label classification, which realizes the automatic annotation of semi-supervised form and solves the time consuming problem of manual annotation; Secondly, in order to solve the problem of mixed graph, the random jump transition strategy is used to get the probability distribution between the mixed graph and the prediction  tag, which realizes the probability estimation of the class label of the unlabeled Web object and obtains the highest top-k correlation score; Finally, through the test on UCI Web dataset and real big data, the results show that the Rand index of the proposed algorithm is better than the selected contrast algorithms, which verifies the effectiveness of the proposed algorithm.

Key words: big data, random jump, Web object, label classification, automatic marking