计算机科学与探索 ›› 2018, Vol. 12 ›› Issue (8): 1295-1304.DOI: 10.3778/j.issn.1673-9418.1705035

• 人工智能与模式识别 • 上一篇    下一篇

中医临床疾病数据多标记分类方法研究

潘主强1,张  林1,张  磊2+,李国正3,颜仕星4   

  1. 1. 西南石油大学 计算机科学学院,成都 610500
    2. 中国中医科学院 中医临床基础医学研究所,北京 100700
    3. 中国中医科学院 中医药数据中心,北京 100700
    4. 上海金灯台信息科技有限公司,上海 201800
  • 出版日期:2018-08-01 发布日期:2018-08-09

Research on Multi-Label Classification Method of Traditional Chinese Medicine Clinical Disease Data

PAN Zhuqiang1, ZHANG Lin1, ZHANG Lei2+, LI Guozheng3, YAN Shixing4   

  1. 1. School of Computer Science, Southwest Petroleum University, Chengdu 610500, China
    2. Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing 100700, China
    3. National Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing 100700, China
    4. Shanghai Menorah Information Technology Co., Ltd., Shanghai 201800, China
  • Online:2018-08-01 Published:2018-08-09

摘要: WML-kNN(weighted multi-label[k]nearest neighbor)算法中近邻点个数取固定值,而没有考虑样本数据的实际特点,可能会将相似度高的点排除在近邻集外,或者将相似度低的点包含在近邻集内,这些都会影响分类器的性能。而中医(traditional Chinese medicine,TCM)临床获得的关于疾病的数据很可能是多标记的,同时由于病例的特殊性,每个病例可能具有不同的相似近邻集。因此,对WML-kNN算法进行了改进,提出WML-GkNN(WML-granular kNN)算法。该算法通过粒计算对粒度空间进行控制,从而确定近邻点集,使得邻域内的样本点有高相似性。在中医临床采集的经络电阻数据上的实验结果显示,WML-GkNN算法提高了分类性能。

关键词: 中医临床数据, 多标记学习, 粒计算, 权重

Abstract: WML-kNN (weighted multi-label [k] nearest neighbor) learning algorithm, the number of neighbor points from fixed value, without considering the actual characteristics of the sample data, may make the high similarity point excluded from the neighbor set, or the low similarity point contained in the neighbor set, which will affect the performance of classifier. Traditional Chinese medicine (TCM) clinical data on the disease are likely to have multiple labels, and because of the particularity of the sample, each sample may have different similarity neighbors. This paper improves the WML-kNN algorithm and proposes WML-GkNN (WML-granular kNN) algorithm. In WML-GkNN algorithm, the granular control is used to control the granularity space, and the set of neighbors is determined, so that the sample points in the neighborhood have high similarity. The experimental results on the meridian resistance data collected by TCM show that the WML-GkNN algorithm improves the classification performance.

Key words: Chinese medicine clinical data, multi-label learning, granular computing, weight