计算机科学与探索 ›› 2007, Vol. 1 ›› Issue (3): 331-339.

• 学术研究 • 上一篇    下一篇

极大熵球面K均值文本聚类分析

修 宇1,3,王士同1,2+,朱 林1,宗成庆2   

  1. 1.江南大学 信息工程学院,江苏 无锡 214036
    2.中科院自动化研究所 模式识别国家重点实验室,北京 100080
    3.安徽工程科技学院 计算机科学与工程系,安徽 芜湖 241000
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-10-20 发布日期:2007-10-20
  • 通讯作者: 修 宇

Maximum-entropy sphere K-means document clustering analysis

XIU Yu1,3,WANG Shitong1,2+,ZHU Lin1,ZONG Chengqing2   

  1. 1.School of Information Engineering,Jiangnan University,Wuxi,Jiangsu 214036,China
    2.National Laboratory of Pattern Recognition,Institute of Automation,Chinese Academy of Sciences,Beijing 100080,China
    3.Department of Computer Science and Engineering,Anhui University of Technology and Science,Wuhu,Anhui 241000,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-10-20 Published:2007-10-20
  • Contact: XIU Yu

摘要: 提出了一种基于极大熵理论的球面K均值文本聚类算法ME-SPKM。该算法利用了传统文本聚类算法SPKmeans中使用的余弦相似度度量,进而引入极大熵理论构造了适合文本聚类的极大熵目标函数。对文本数据的实验证明了极大熵球面K均值文本聚类算法取得了比传统文本聚类算法更好的聚类效果。

关键词: 极大熵, 文本聚类, 球面K均值

Abstract: A maximum-entropy version of the spherical K-means document clustering algorithm ME-SPKM is presented based on the well-known maximum-entropy. The proposed method uses the cosine similarity which is adopted by the typical text clustering algorithm SPKmeans,then constructs a maximum-entropy-based objective function. Experimental results demonstrate that the maximum-entropy spherical K-means ME-SPKM can achieve better clustering results than traditional clustering approaches in text clustering.

Key words: maximum-entropy, document clustering, spherical K-means