计算机科学与探索 ›› 2019, Vol. 13 ›› Issue (6): 1005-1015.DOI: 10.3778/j.issn.1673-9418.1806019

• 人工智能 • 上一篇    下一篇

融合生成对抗网络和朴素贝叶斯皮肤病诊断方法

商显震1,2,韩  萌1,孙毓忠2+,孙宇宁3,陈  旭2,胡满满2,梅御东2   

  1. 1.北方民族大学 计算机科学与工程学院,银川 750021
    2.中国科学院 计算技术研究所 计算机体系结构国家重点实验室,北京 100190
    3.云南大学 软件学院,昆明 650091
  • 出版日期:2019-06-01 发布日期:2019-06-14

Skin Diseases Diagnosis Method Based on Generative Adversarial Networks and Naive Bayes

SHANG Xianzhen1,2, HAN Meng1, SUN Yuzhong2+, SUN Yuning3, CHEN Xu2, HU Manman2, MEI Yudong2   

  1. 1. School of Computer Science and Engineering, North Minzu University, Yinchuan 750021, China
    2. State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
    3. National Pilot School of Software, Yunnan University, Kunming 650091, China
  • Online:2019-06-01 Published:2019-06-14

摘要: 不同皮肤病间发病率的差异导致了皮肤病数据类不平衡现象,对使用机器学习方法构建高效、准确的皮肤病诊断模型带来了巨大挑战。提出一种融合生成对抗网络(generative adversarial networks,GAN)和朴素贝叶斯的皮肤病二分类诊断方法:在皮肤病数据集上训练朴素贝叶斯二分类器作为诊断器,创新性地使用GAN为前者生成补充训练样本,使其训练集正负类样本达到平衡。针对皮肤病诊断多分类问题,提出一种融合生成对抗网络和朴素贝叶斯的多分类诊断方法:使用GAN和朴素贝叶斯训练皮肤病单病种二分类器,并结合了词频-逆文档频率算法(term frequency-inverse document frequency,TF-IDF),将多个二分类器组合成一个多分类器作为诊断器。与六种诊断方法进行了对比实验,提出的两种皮肤病诊断方法准确率和召回率均有提升。

关键词: 皮肤病诊断, 朴素贝叶斯(NB), 词频-逆文档频率(TF-IDF)算法, 生成对抗网络(GAN), 类不平衡数据

Abstract: Different incidences of skin diseases lead to the imbalance of skin diseases data, so that there is a great challenge to train an effective and accurate diagnosis model by machine learning. In this paper, a binary classification diagnosis method is proposed, which is a fusion of GAN (generative adversarial networks) and NB (naive Bayes)    algorithm. In this method, a naive Bayes binary classifier is trained for each disease diagnosis, and a GAN model is innovatively trained to generate more samples for training the naive Bayes binary classifier, bringing positive samples and negative samples to balance. As for multiple classification of skin diseases diagnosis, a multiple classification   diagnosis method is proposed, which is a fusion of GAN and NB algorithm. In this method, a binary classifier is trained by NB algorithm and GAN, with TF-IDF algorithm added in the classifier, and the multiple classifier consists of many binary classifiers. The experiments compare the proposed diagnosis methods with 6 methods. The precision and recall of the two skin diseases diagnosis methods proposed in this paper are improved.

Key words: skin diseases diagnosis, naive Bayes (NB), term frequency-inverse document frequency (TF-IDF) algorithm, generative adversarial networks (GAN), imbalanced data