Medical Knowledge Graph Question-Answering System Based on Hybrid Dynamic Masking and Multi-strategy Fusion

doi:10.3778/j.issn.1673-9418.2401072

Abstract

Abstract: Medical knowledge graph question-answering combines medical knowledge and natural language processing technology to provide accurate and fast question-answering services for medical practitioners and patients. However, the current Chinese medical knowledge graphs are not comprehensive enough due to the surge in data. Additionally, the complex and ambiguous nature of medical questions poses a significant challenge in accurately identifying entity information and generating answers that are both easily comprehensible and accessible to the public. This paper proposes a medical knowledge graph question-answering framework based on hybrid dynamic masking and multi-strategy fusion. Initially, a medical knowledge graph encompassing 34167 entities and 297463 relationships is constructed by integrating public datasets and disease knowledge from medical platforms, covering categories such as diseases, medications, and food. Subsequently, a BERT-MaskAttention-BiLSTM-CRF hybrid dynamic masking model is introduced to accurately identify medical entity information in the input, effectively focusing on essential content and eliminating interference from redundant information. Finally, entity alignment strategies are employed to unify and standardize medical entities, while intent recognition strategies delve into users’ query intentions. This is coupled with the use of large language models to refine the output from the knowledge graph, ensuring that the responses are more readily comprehensible. Experimental results demonstrate that the model achieves a macro-average F1 score of 0.9602 in entity recognition comparative experiments and an average accuracy of 0.9656 in question-answering tests. The generated content is more easily comprehensible and interpretable.

Key words: hybrid dynamic masking, multi-strategy fusion, knowledge graph, medical question-answering, large language model

摘要： 医疗知识图谱问答结合医学知识和自然语言处理技术，为医疗从业者和患者提供准确、快速的问答服务。随着数据激增，现有的中文医疗知识图谱不够全面，并且医学问题复杂多义，准确识别实体信息、生成通俗易懂的回答仍有挑战。提出了一种基于混合动态掩码与多策略融合的医疗知识图谱问答框架。通过整合公开数据集与医药平台的疾病知识，构建了一个包含34 167个实体和297 463条关系的医疗知识图谱，涵盖疾病、药品、食物等多个类别。提出BERT-MaskAttention-BiLSTM-CRF混合动态掩码模型来精确识别输入的医疗实体信息，更有效地关注重要内容，去除冗余信息干扰。采用实体对齐策略将医疗实体进行统一和标准化，通过意图识别策略深入理解用户的查询意图，结合大型语言模型对知识图谱的输出进行润色，保证回答内容更加容易理解。实验结果表明，在实体识别对比实验中模型的宏观平均F1值达到0.960 2，在问答测试实验中，平均准确率达到0.965 6，且生成的内容更加通俗易懂，可解释性强。

关键词: 混合动态掩码, 多策略融合, 知识图谱, 医疗问答, 大语言模型

WANG Runzhou, ZHANG Xinsheng. Medical Knowledge Graph Question-Answering System Based on Hybrid Dynamic Masking and Multi-strategy Fusion[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(10): 2770-2786.

王润周, 张新生. 基于混合动态掩码与多策略融合的医疗知识图谱问答[J]. 计算机科学与探索, 2024, 18(10): 2770-2786.

References

[1] 李俊卓, 昝红英, 闫英杰, 等. 儿科疾病及保健知识问答系统的构建[J]. 中文信息学报, 2022, 36(1): 127-134.
LI J Z, ZAN H Y, YAN Y J, et al. Question answering system for pediatric diseases and health care knowledge[J]. Journal of Chinese Information Processing, 2022, 36(1): 127-134.
[2] 陈璟浩, 曾桢, 李纲. 基于知识图谱的“一带一路”投资问答系统构建[J]. 图书情报工作, 2020, 64(12): 95-105.
CHEN J H, ZENG Z, LI G. A question answering system for“the Belt and Road” investment based on knowledge graph[J]. Library and Information Service, 2020, 64(12): 95-105.
[3] GUO Q, WANG X, ZHU Z, et al. A knowledge inference model for question answering on an incomplete knowledge graph[J]. Applied Intelligence, 2023, 53(7): 7634-7646.
[4] 廖开际, 黄琼影, 席运江. 在线医疗社区问答文本的知识图谱构建研究[J]. 情报科学, 2021, 39(3): 51-59.
LIAO K J, HUANG Q Y, XI Y J. Knowledge graph construction of online medical community Q&A texts[J]. Information Science, 2021, 39(3): 51-59.
[5] YUAN J B, JIN Z W, GUO H, et al. Constructing biomedical domain-specific knowledge graph with minimum supervision[J]. Knowledge and Information Syestems, 2020, 62(1): 317-336.
[6] BEN ABACHA A, ZWEIGENBAUM P. Means: a medical question-answering system combining NLP techniques and semantic Web technologies[J]. Information Processing & Management, 2015, 51(5): 570-594.
[7] 王守会, 覃飙. 知识库问答系统研究进展[J]. 小型微型计算机系统, 2021, 42(9): 1793-1801.
WANG S H, QIN B. Research progress of knowledge base question answering[J]. Journal of Chinese Computer Systems, 2021, 42(9): 1793-1801.
[8] ZHU S G, CHENG X, SU S. Knowledge-based question answering by tree-to-sequence learning[J]. Neurocomputing, 2020, 372: 64-72.
[9] HU X, DUAN J L, DANG D P. Natural language question answering over knowledge graph: the marriage of SPARQL query and keyword search[J]. Knowledge and Information Systems, 2021, 63(4): 819-844.
[10] 曾帅, 王帅, 袁勇, 等. 面向知识自动化的自动问答研究进展[J]. 自动化学报, 2017, 43(9): 1491-1508.
ZENG S, WANG S, YUAN Y, et al. Towards knowledge automation: a survey on question answering systems[J]. Acta Automatica Sinica, 2017, 43(9): 1491-1508.
[11] 郑泳智, 朱定局, 吴惠粦, 等. 知识图谱问答领域综述[J]. 计算机系统应用, 2022, 31(4): 1-13.
ZHENG Y Z, ZHU D J, WU H L, et al. Overview on knowledge graph question answering[J]. Computer Systems & Applications, 2022, 31(4): 1-13.
[12] LAN Y S, HE G L, JIANG J H, et al. Complex knowledge base question answering: a survey[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(11): 11196-11215.
[13] 陈跃鹤, 贾永辉, 谈川源, 等. 基于知识图谱全局和局部特征的复杂问答方法[J]. 软件学报, 2023, 34(12): 5614-5628.
CHEN Y H, JIA Y H, TAN C Y, et al. Method for complex question answering based on global and local features of knowledge graph[J]. Journal of Software, 2023, 34(12): 5614-5628.
[14] 李贺, 刘嘉宇, 李世钰, 等. 基于疾病知识图谱的自动问答系统优化研究[J]. 数据分析与知识发现, 2021, 5(5): 115-126.
LI H, LIU J Y, LI S Y, et al. Optimizing automatic question answering system based on disease knowledge graph[J]. Data Analysis and Knowledge Discovery, 2021, 5(5): 115-126.
[15] 王寅秋, 虞为, 陈俊鹏. 融合知识图谱的中文医疗问答社区自动问答研究[J]. 数据分析与知识发现, 2023, 7(3): 97-109.
WANG Y Q, YU W, CHEN J P. Automatic question-answering in Chinese medical Q&A community with knowledge graph[J]. Data Analysis and Knowledge Discovery, 2023, 7(3): 97-109.
[16] 范俊杰, 马海群, 刘兴丽. 基于开源情报的军事知识图谱问答智能服务研究[J]. 数据分析与知识发现, 2024, 8(7): 118-127.
FAN J J, MA H Q, LIU X L. Smart question-answering service for military knowledge graphs based on open-source intelligence[J]. Data Analysis and Knowledge Discovery, 2024, 8(7): 118-127.
[17] 曹明宇, 李青青, 杨志豪, 等. 基于知识图谱的原发性肝癌知识问答系统[J]. 中文信息学报, 2019, 33(6): 88-93.
CAO M Y, LI Q Q, YANG Z H, et al. A question answering system for primary liver cancer based on knowledge graph [J]. Journal of Chinese Information Processing, 2019, 33(6): 88-93.
[18] 乔凯, 陈可佳, 陈景强. 基于知识图谱与关键词注意机制的中文医疗问答匹配方法[J]. 模式识别与人工智能, 2021, 34(8): 733-741.
QIAO K, CHEN K J, CHEN J Q. Chinese medical question answering matching method based on knowledge graph and keyword attention mechanism[J]. Pattern Recognition and Artificial Intelligence, 2021, 34(8): 733-741.
[19] 冯钧, 李艳, 杭婷婷. 问答系统中复杂问题分解方法研究综述[J]. 计算机工程与应用, 2022, 58(17): 23-33.
FENG J, LI Y, HANG T T. Survey on question decomposition method in question answering system[J]. Computer Engineering and Applications, 2022, 58(17): 23-33.
[20] HAO Y, ZHANG Y, LIU K, et al. An end-to-end model for question answering over knowledge base with cross-attention combining global knowledge[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2017: 221-231.
[21] 韩普, 顾亮. 基于混合深度学习的中文医学实体抽取研究 [J]. 图书情报工作, 2022, 66(14): 119-127.
HAN P, GU L. Research on extraction of Chinese medical entities based on hybrid deep learning[J]. Library and Information Service, 2022, 66(14): 119-127.
[22] 温有奎, 温浩, 乔晓东. 让知识产生智慧—基于人工智能的文本挖掘与问答技术研究[J]. 情报学报, 2019, 38(7): 722-730.
WEN Y K, WEN H, QIAO X D. Research on the methods of information science and artificial intelligence fusion innovation[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38(7): 722-730.
[23] BAKHSHI M, NEMATBAKHSH M, MOHSENZADEH M, et al. Data-driven construction of SPARQL queries by approximate question graph alignment in question answering over knowledge graphs[J]. Expert Systems with Applications, 2020, 146: 113205.
[24] 罗玲, 李硕凯, 何清, 等. 基于知识图谱、TF-IDF和BERT模型的冬奥知识问答系统[J]. 智能系统学报, 2021, 16(4): 819-826.
LUO L, LI S K, HE Q, et al. Winter Olympic Q&A system based on knowledge map, TF-IDF and BERT model[J]. CAAI Transactions on Intelligent Systems, 2021, 16(4): 819-826.
[25] 张云中, 郭冬, 王亚鸽, 等. 基于知识图谱的红色历史人物知识问答服务框架研究[J]. 图书情报工作, 2021, 65(16): 108-117.
ZHANG Y Z, GUO D, WANG Y G, et al. Framework of knowledge Q&A service for red historical figures based on knowledge graph[J]. Library and Information Service, 2021, 65(16): 108-117.
[26] 马自力, 王淑营, 张海柱, 等. 基于知识图谱的智能问答意图识别联合模型[J]. 计算机工程与应用, 2023, 59(6): 171-178.
MA Z L, WANG S Y, ZHANG H Z, et al. Joint model of intelligent Q&A intent recognition based on knowledge graph[J]. Computer Engineering and Applications, 2023, 59(6): 171-178.
[27] 杨喆, 许甜, 靳哲, 等. 基于知识图谱的羊群疾病问答系统的构建与实现[J]. 华中农业大学学报, 2023, 42(3): 63-70.
YANG Z, XU T, JIN Z, et al. Construction and application of knowledge graph of sheep & goat disease[J]. Journal of Huazhong Agricultural University, 2023, 42(3): 63-70.
[28] 席运江, 李曼, 邓雨珊, 等. 中文在线医疗社区问答内容知识图谱构建研究[J]. 图书情报工作, 2024, 68(4): 124-136.
XI Y J, LI M, DENG Y S, et al. A knowledge graph construction for Q&A text in Chinese online medical community[J]. Library and Information Service, 2024, 68(4): 124-136.
[29] 陈明, 刘蓉, 熊回香. 基于医疗知识图谱的智能问答系统研究[J]. 情报科学, 2023, 41(12): 118-126.
CHEN M, LIU R, XIONG H X. Research on intelligent question-answering system based on the medical knowledge graph[J]. Information Science, 2023, 41(12): 118-126.
[30] BRISKILAL J, SUBALALITHA C N. An ensemble model for classifying idioms and literal texts using BERT and RoBERTa[J]. Information Processing & Management, 2022, 59(1): 102756.
[31] 孔德婧, 董放, 陈子婧, 等. 离群专利视角下的新兴技术预测——基于BERT模型和深度神经网络[J]. 图书情报工作, 2021, 65(17): 131-141.
KONG D J, DONG F, CHEN Z J, et al. Prediction of emerging technologies from the perspective of outlier patents-based on BERT model and deep neural networks[J]. Library and Information Service, 2021, 65(17): 131-141.
[32] 祁瑞华, 邵震, 关菁华, 等. 基于MPNet预训练和多头注意力特征融合的引文意图分类方法[J]. 模式识别与人工智能, 2022, 35(9): 849-857.
QI R H, SHAO Z, GUAN J H, et al. Citation intent classification method based on MPNet pretraining and multi-head attention feature fusion[J]. Pattern Recognition and Artificial Intelligence, 2022, 35(9): 849-857.
[33] LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436-444.
[34] 国显达, 那日萨, 崔少泽. 基于CNN-BiLSTM的消费者网络评论情感分析[J]. 系统工程理论与实践, 2020, 40(3): 653-663.
GUO X D, ZHAO Narisa, CUI S Z. Consumer reviews sentiment analysis based on CNN-BiLSTM[J]. Systems Engineering-Theory & Practice, 2020, 40(3): 653-663.
[35] 李晋荣, 吕国英, 李茹, 等. 结合Hybrid Attention机制和BiLSTM-CRF的汉语否定语义表示及标注[J]. 计算机工程与应用, 2023, 59(9): 167-175.
LI J R, LYU G Y, LI R, et al. Chinese negative semantic representation and annotation combined with Hybrid Attention mechanism and BiLSTM-CRF[J]. Computer Engineering and Applications, 2023, 59(9): 167-175.
[36] 韦紫君, 宋玲, 胡小春, 等. 基于实体级遮蔽BERT与BiLSTM-CRF的农业命名实体识别[J]. 农业工程学报, 2022, 38(15): 195-203.
WEI Z J, SONG L, HU X C, et al. Named entity recognition of agricultural based entity-level masking BERT and BiLSTM-CRF[J]. Transactions of the Chinese Society of Agricultural Engineering, 2022, 38(15): 195-203.
[37] 唐晓波, 高和璇. 基于关键词词向量特征扩展的健康问句分类研究[J]. 数据分析与知识发现, 2020, 4(7): 66-75.
TANG X B, GAO H X. Classification of health questions based on vector extension of keywords[J]. Data Analysis and Knowledge Discovery, 2020, 4(7): 66-75.
[38] HU W, WU L, JIAN M, et al. Cosine metric supervised deep hashing with balanced similarity[J]. Neurocomputing, 2021, 448: 94-105.
[39] LIANG D C, WU Y Q, DUAN W Y. Multiple granularity user intention fairness recognition of intelligent government Q&A system via three-way decision[J]. Information Sciences, 2023, 631: 305-326.
[40] 张鹤译, 王鑫, 韩立帆, 等. 大语言模型融合知识图谱的问答系统研究[J]. 计算机科学与探索, 2023, 17(10): 2377-2388.
ZHANG H Y, WANG X, HAN L F, et al. Research on question answering system on joint of knowledge graph and large language models[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(10): 2377-2388.
[41] 吴俊, 程垚, 郝瀚, 等. 基于BERT嵌入BiLSTM-CRF模型的中文专业术语抽取研究[J]. 情报学报, 2020, 39(4): 409-418.
WU J, CHENG Y, HAO H, et al. Automatic extraction of Chinese terminology based on BERT embedding and BiLSTM-CRF model[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(4): 409-418.
[42] ZHU Y, YANG X, WU Y, et al. Differentiable N-gram objective on abstractive summarization[J]. Expert Systems with Applications, 2023, 215: 119367.
[43] YAN C, LIU J, LIU W, et al. Research on public opinion sentiment classification based on attention parallel dual-channel deep learning hybrid model[J]. Engineering Applications of Artificial Intelligence, 2022, 116: 105448.
[44] ABOUTALEB A, FAYED A, ISMAIL D, et al. BERT BiLSTM-attention similarity model[C]//Proceedings of the 2021 IEEE International Conference on Artificial Intelligence and Computer Applications. Piscataway: IEEE, 2021: 366-371.
[45] 袁里驰. 基于BERT-BiLSTM-CRF的中文分词和词性标注联合方法[J]. 小型微型计算机系统, 2023, 44(9): 1906-1911.
YUAN L C. Joint method for Chinese word segmentation and part-of-speech tagging based on BERT-BiLSTM-CRF[J]. Journal of Chinese Computer Systems, 2023, 44(9): 1906-1911.