计算机科学与探索 ›› 2024, Vol. 18 ›› Issue (10): 2770-2786.DOI: 10.3778/j.issn.1673-9418.2401072

• 人工智能·模式识别 • 上一篇    

基于混合动态掩码与多策略融合的医疗知识图谱问答

王润周,张新生   

  1. 西安建筑科技大学 管理学院,西安 710055
  • 出版日期:2024-10-01 发布日期:2024-09-29

Medical Knowledge Graph Question-Answering System Based on Hybrid Dynamic Masking and Multi-strategy Fusion

WANG Runzhou, ZHANG Xinsheng   

  1. School of Management, Xi??an University of Architecture and Technology, Xi??an 710055, China
  • Online:2024-10-01 Published:2024-09-29

摘要: 医疗知识图谱问答结合医学知识和自然语言处理技术,为医疗从业者和患者提供准确、快速的问答服务。随着数据激增,现有的中文医疗知识图谱不够全面,并且医学问题复杂多义,准确识别实体信息、生成通俗易懂的回答仍有挑战。提出了一种基于混合动态掩码与多策略融合的医疗知识图谱问答框架。通过整合公开数据集与医药平台的疾病知识,构建了一个包含34 167个实体和297 463条关系的医疗知识图谱,涵盖疾病、药品、食物等多个类别。提出BERT-MaskAttention-BiLSTM-CRF混合动态掩码模型来精确识别输入的医疗实体信息,更有效地关注重要内容,去除冗余信息干扰。采用实体对齐策略将医疗实体进行统一和标准化,通过意图识别策略深入理解用户的查询意图,结合大型语言模型对知识图谱的输出进行润色,保证回答内容更加容易理解。实验结果表明,在实体识别对比实验中模型的宏观平均F1值达到0.960 2,在问答测试实验中,平均准确率达到0.965 6,且生成的内容更加通俗易懂,可解释性强。

关键词: 混合动态掩码, 多策略融合, 知识图谱, 医疗问答, 大语言模型

Abstract: Medical knowledge graph question-answering combines medical knowledge and natural language processing technology to provide accurate and fast question-answering services for medical practitioners and patients. However, the current Chinese medical knowledge graphs are not comprehensive enough due to the surge in data. Additionally, the complex and ambiguous nature of medical questions poses a significant challenge in accurately identifying entity information and generating answers that are both easily comprehensible and accessible to the public. This paper proposes a medical knowledge graph question-answering framework based on hybrid dynamic masking and multi-strategy fusion. Initially, a medical knowledge graph encompassing 34167 entities and 297463 relationships is constructed by integrating public datasets and disease knowledge from medical platforms, covering categories such as diseases, medications, and food. Subsequently, a BERT-MaskAttention-BiLSTM-CRF hybrid dynamic masking model is introduced to accurately identify medical entity information in the input, effectively focusing on essential content and eliminating interference from redundant information. Finally, entity alignment strategies are employed to unify and standardize medical entities, while intent recognition strategies delve into users’ query intentions. This is coupled with the use of large language models to refine the output from the knowledge graph, ensuring that the responses are more readily comprehensible. Experimental results demonstrate that the model achieves a macro-average F1 score of 0.9602 in entity recognition comparative experiments and an average accuracy of 0.9656 in question-answering tests. The generated content is more easily comprehensible and interpretable.

Key words: hybrid dynamic masking, multi-strategy fusion, knowledge graph, medical question-answering, large language model