计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (2): 413-427.DOI: 10.3778/j.issn.1673-9418.2008028

• 人工智能 • 上一篇    下一篇

融入深层病理信息挖掘的乳腺肿块识别模型

李广丽1, 袁天1, 李传秀1, 邬任重2, 卓建武1, 张红斌2,+()   

  1. 1.华东交通大学 信息工程学院,南昌 330013
    2.华东交通大学 软件学院,南昌 330013
  • 收稿日期:2020-08-10 修回日期:2020-10-21 出版日期:2022-02-01 发布日期:2020-11-06
  • 通讯作者: + E-mail: zhanghongbin@whu.edu.cn
  • 作者简介:李广丽(1977—),女,广西博白人,硕士,副教授,CCF会员,主要研究方向为医学图像分析、跨媒体检索、推荐系统。
    袁天(1994—),男,湖北孝感人,硕士研究生, CCF学生会员,主要研究方向为肿瘤图像识别、机器学习。
    李传秀(1995—),男,山东菏泽人,硕士研究生,主要研究方向为肿瘤图像识别、深度学习。
    邬任重(1995—),女,江西丰城人,硕士,主要研究方向为乳腺肿瘤识别、机器学习。
    卓建武(1994—),男,广东汕尾人,硕士研究生,主要研究方向为自然语言处理、深度学习。
    张红斌(1979—),男,江苏如皋人,博士,副教授,CCF高级会员,主要研究方向为自然语言处理、图像识别、推荐系统等。
  • 基金资助:
    国家自然科学基金(62161011);国家自然科学基金(61762038);国家自然科学基金(61861016);教育部人文社会科学研究规划基金(20YJAZH142);江西省科技厅重点研发计划(20192BBE50071);江西省科技厅重点研发计划(20202BBEL53003);江西省教育厅科技项目(GJJ190323);江西省教育厅科技项目(GJJ200644);江西省高校人文社科基金(TQ19101);江西省高校人文社科基金(TQ20108);江西省自然科学基金面上项目(20202BABL202044);江西省自然科学基金面上项目(20212BAB202006)

Breast Mass Recognition Model via Deep-Level Pathological Information Mining

LI Guangli1, YUAN Tian1, LI Chuanxiu1, WU Renzhong2, ZHUO Jianwu1, ZHANG Hongbin2,+()   

  1. 1. School of Information Engineering, East China Jiaotong University, Nanchang 330013, China
    2. School of Software, East China Jiaotong University, Nanchang 330013, China
  • Received:2020-08-10 Revised:2020-10-21 Online:2022-02-01 Published:2020-11-06
  • About author:LI Guangli, born in 1977, M.S., associate professor, member of CCF. Her research interests include medical image analysis, cross-media retrieval and recommendation system.
    YUAN Tian, born in 1994, M.S. candidate, student member of CCF. His research interests include tumor image recognition and machine learning.
    LI Chuanxiu, born in 1995, M.S. candidate. His research interests include tumor image recognition and deep learning.
    WU Renzhong, born in 1995, M.S. Her research interests include breast cancer recognition and machine learning.
    ZHUO Jianwu, born in 1994, M.S. candidate. His research interests include natural language processing and deep learning.
    ZHANG Hongbin, born in 1979, Ph.D., associate professor, senior member of CCF. His research interests include natural language processing, image recognition, recommendation system, etc.
  • Supported by:
    National Natural Science Foundation of China(62161011);National Natural Science Foundation of China(61762038);National Natural Science Foundation of China(61861016);Humanities and Social Science Research Planning Fund of Ministry of Education(20YJAZH142);Key Research and Development Program of Jiangxi Provincial Department of Science and Technology(20192BBE50071);Key Research and Development Program of Jiangxi Provincial Department of Science and Technology(20202BBEL53003);Science and Technology Project of Jiangxi Provincial Department of Education(GJJ190323);Science and Technology Project of Jiangxi Provincial Department of Education(GJJ200644);Social Science Foundation of Jiangxi Higher Education(TQ19101);Social Science Foundation of Jiangxi Higher Education(TQ20108);Natural Science Foundation of Jiangxi Province(20202BABL202044);Natural Science Foundation of Jiangxi Province(20212BAB202006)

摘要:

乳腺癌是女性中最常见的癌症,乳腺肿块识别模型能有效地辅助医生的临床诊断工作。然而,医学图像样本稀缺使识别模型易过拟合。提出融入深层病理信息挖掘的乳腺肿块识别模型:构建样本精选策略,跨越不同乳腺造影图像数据集筛选优质样本,从数据增强角度应对医学图像样本稀缺;由浅入深挖掘有限标注样本中蕴含的病理信息,从特征优选角度应对医学图像样本稀缺。设计多视角有效区域基因优选(MvERGS)算法,以精化原始图像特征,提升特征判别性并压缩特征维度,更好地匹配样本数量;对精化的新特征执行判别相关分析(DCA),深入挖掘异构特征间的跨模态相关性,即深层病理信息,以准确刻画乳腺肿块病灶区域。基于深层病理信息与传统分类器训练出高效的乳腺肿块识别模型,完成乳腺造影图像分类。实验表明:识别模型的关键技术指标,包括Accuracy和AUC,均优于主流基线,样本稀缺导致的过拟合问题得到缓解。

关键词: 乳腺肿块识别, 病理信息挖掘, 样本精选, 特征优选, 多视角有效区域基因优选(MvERGS)

Abstract:

Breast cancer is the most common variant of cancer in women. Breast mass recognition model can assist pathologists in formulating their clinical diagnoses more efficiently. However, sample scarcity in the field of medical image analysis usually causes the overfitting problem. A novel breast mass recognition model via deep-level pathological information mining is proposed in this paper to address this problem. Firstly, a new sample refinement strategy is constructed to obtain image samples with high-quality across different mammographic datasets, which mainly deals with the sample scarcity problem from the data augmentation perspective. The deep-level pathological information contained in the limited labeled samples is mined out in turn from the shallower to the deeper, which mainly copes with the sample scarcity problem from the feature selection perspective. A novel feature selection algorithm called multi-view efficient range-based gene selection (MvERGS) is proposed to improve the discriminant ability of each image feature and reduce the corresponding dimensions, which helps to fit sample size well. Then the state-of-the-art discriminant correlation analysis (DCA) method is employed to analyze the deep cross-modal correlations among diverse refined features, which is used to depict the lesion areas in mammographs more accurately. Finally, based on deep-level pathological information and traditional classifier, an effective breast mass classification model is trained. Extensive experimental results demonstrate that the proposed breast mass classification model is superior to most baselines in some key metrics, including Accuracy and AUC, and it can cope with the overfitting problem very well.

Key words: breast mass recognition, pathological information mining, sample refinement, feature selection, multi-view efficient range-based gene selection (MvERGS)

中图分类号: