Application of Generative Large Language Models in Chinese Radiology Domain

doi:10.3778/j.issn.1673-9418.2406041

Abstract

Abstract: In the Chinese radiology domain, radiology reports serve as a crucial basis for clinical decision-making. Therefore, utilizing natural language processing (NLP) technology to understand and learn from the textual content of radiology reports, thereby aiding radiological clinical work, has become an important research direction in this domain. However, when dealing with the natural language classification and generation tasks based on Chinese radiology reports using traditional methods, there are still challenges such as a lack of training corpora, privacy concerns, and poor model generalization capabilities, leading to insufficient overall performance. To address these issues, a solution for natural language tasks in the Chinese radiology domain based on locally efficient fine-tuning large language models is proposed. By collecting and constructing a large-scale, high-quality dataset for natural language tasks in the Chinese radiology reports, and employing the LoRA efficient fine-tuning method for supervised fine-tuning training of the open-source large language model Baichuan2, the “RadGPT” capable of solving four types of clinical tasks in the Chinese radiology domain simultaneously is proposed. A set of evaluation systems for natural language classification and generation tasks in the Chinese radiology domain is introduced. Multiple sets of experiments are conducted on three types of radiology report datasets from two centers, and comparisons are made with several typical existing methods. The results demonstrate that the proposed method performs better in terms of classification performance, text summarization and expansion capabilities, and model generalization.

Key words: large language model, radiology report, text classification, text generation, efficient fine-tuning strategy

摘要： 在中文放射医学领域中，影像学报告是临床决策的重要依据。因此，利用自然语言处理（NLP）技术来理解和学习影像学报告的文本内容，并以此辅助完成放射科临床工作，已成为该领域的重要研究方向。然而，在使用传统方法处理基于中文影像学报告的自然语言分类与生成任务时，仍然面临训练语料匮乏且涉及隐私、模型泛化能力较差等限制导致的综合性能不足的情况。针对上述问题，提出了一种基于本地高效微调大语言模型的中文放射医学领域自然语言任务解决方案。通过收集并构建大规模高质量中文影像学报告自然语言任务数据集，采用LoRA高效微调方法对开源大语言模型Baichuan2进行有监督微调训练，提出了能够同时解决四种中文放射医学领域临床任务的“龙影大模型”。提出了一套中文放射医学领域自然语言分类与生成任务评价体系。在来自两家中心的三个医学影像种类的报告数据集上进行了多组实验，并与几种典型现有方法进行了对比，结果显示所提方法在分类性能、文本总结与扩充能力和模型泛化性上表现更好。

关键词: 大语言模型, 影像学报告, 文本分类, 文本生成, 高效微调策略

CHEN Longfei, GAO Xin, HOU Haotian, YE Chuyang, LIU Ya'ou, ZHANG Meihui. Application of Generative Large Language Models in Chinese Radiology Domain[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(9): 2337-2348.

陈龙飞, 高鑫, 侯皓天, 叶初阳, 刘亚欧, 张美慧. 生成式大语言模型在中文放射医学领域的应用研究[J]. 计算机科学与探索, 2024, 18(9): 2337-2348.

References

[1] RAY P P. Integrating AI in radiology: insights from GPT-generated reports and multimodal LLM performance on European board of radiology examinations[J]. Japanese Journal of Radiology, 2024. DOI:10.1007/s11604-024-01576-6.
[2] LIU Z, ZHONG A, LI Y, et al. Tailoring large language models to?radiology: a preliminary approach to LLM adaptation for a?highly specialized domain[C]//LNCS 14348: Proceedings of the 14th International Workshop on Machine Learning in Medical Imaging, Vancouver, Oct 8, 2023. Cham: Springer, 2023: 464-473.
[3] LIU H, ZHANG Z, XU Y, et al. Use of BERT (bidirectional encoder representations from transformers)-based deep learning method for extracting evidences in Chinese radiology reports: development of a computer-aided liver cancer diagnosis framework[J]. Journal of Medical Internet Research, 2021, 23(1): e19689.
[4] PONS E, BRAUN L M, HUNINK M M, et al. Natural language processing in radiology: a systematic review[J]. Radiology, 2016, 279(2): 329-343.
[5] LIU H, XU Y, ZHANG Z, et al. A natural language processing pipeline of Chinese free-text radiology reports for liver cancer diagnosis[J]. IEEE Access, 2020, 8: 159110-159119.
[6] MOZAYAN A, FABBRI A R, MANEEVESE M, et al. Practical guide to natural language processing for radiology[J]. Radiographics, 2021, 41(5): 1446-1453.
[7] CHEN P H. Essential elements of natural language processing: what the radiologist should know[J]. Academic Radiology, 2020, 27(1): 6-12.
[8] WU J, HOU S, JIN M, et al. LDA feature selection based text classification and user clustering in Chinese online health community[J]. Journal of the China Society for Scientific and Technical Information, 2017, 36(11): 1183-1191.
[9] THOMPSON J, HU J X, MUDARANTHAKAM D P, et al. Relevant word order vectorization for improved natural language processing in electronic health records[J]. Scientific Reports, 2019, 9: 9253.
[10] ONG C J, ORFANOUDAKI A, ZHANG R, et al. Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports[J]. PLoS One, 2020, 15(6): 16.
[11] BRESSEM K K, ADAMS L C, GAUDIN R A, et al. Highly accurate classification of chest radiographic reports using a deep learning natural language model pre-trained on 3.8 million text reports[J]. Bioinformatics, 2021, 36(21): 5255-5261.
[12] 杜琳, 曹东, 林树元, 等. 基于BERT与Bi-LSTM融合注意力机制的中医病历文本的提取与自动分类[J]. 计算机科学, 2020, 47(S2): 416-420.
DU L, CAO D, LIN S Y, et al. Extraction and automatic classification of TCM medical records based on attention mechanism of BERT and Bi-LSTM[J]. Computer Science, 2020, 47(S2): 416-420.
[13] KIM M, ONG K T I, CHOI S, et al. Natural language processing to predict isocitrate dehydrogenase genotype in diffuse glioma using MR radiology reports[J]. European Radiology, 2023, 33(11): 8017-8025.
[14] ZHANG Y, DING D Y, QIAN T, et al. Learning to summarize radiology findings[EB/OL]. [2024-04-16]. https://arxiv.org/abs/1809.04698.
[15] MACAVANEY S, SOTUDEH S, COHAN A, et al. Ontology-aware clinical abstractive summarization[C]//Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, Jul 21-25, 2019. New York: ACM, 2019: 1013-1016.
[16] RAFFEL C, SHAZEER N, ROBERTS A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer[J]. Journal of Machine Learning Research, 2020, 21(140): 1-67.
[17] LEWIS M, LIU Y, GOYAL N, et al. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension[EB/OL]. [2024-04-16]. https://arxiv.org/abs/1910.13461.
[18] LIU Z, ZHONG T, LI Y, et al. Evaluating large language models for radiology natural language processing[EB/OL]. [2024-04-16]. https://arxiv.org/abs/2307.13693.
[19] MA C, WU Z, WANG J, et al. An iterative optimizing framework for radiology report summarization with ChatGPT[J]. IEEE Transactions on Artificial Intelligence, 2024, 5(8): 4163-4175.
[20] LIU J, ZHANG Z, XIAO J, et al. Large language model locally fine-tuning (LLMLF) on Chinese medical imaging reports[C]//Proceedings of the 6th International Conference on Big Data Technologies, Qingdao, Sep 22-24, 2023. New York: ACM, 2023: 273-279.
[21] BROWN T, MANN B, RYDER N, et al. Language models are few-shot learners[C]//Advances in Neural Information Processing Systems 33, 2020: 1877-1901.
[22] HOFFMANN J, BORGEAUD S, MENSCH A, et al. Training compute-optimal large language models[EB/OL]. [2024-04-16]. https://arxiv.org/abs/2203.15556.
[23] ZIEGLER D M, STIENNON N, WU J, et al. Fine-tuning language models from human preferences[EB/OL]. [2024-04-16]. https://arxiv.org/abs/1909.08593.
[24] OUYANG L, WU J, JIANG X, et al. Training language models to follow instructions with human feedback[C]//Advances in Neural Information Processing Systems 35, 2022: 27730-27744.
[25] THOPPILAN R, DE FREITAS D, HALL J, et al. LaMDA: language models for dialog applications[EB/OL]. [2024-04-16]. https://arxiv.org/abs/2201.08239.
[26] DU Z, QIAN Y, LIU X, et al. GLM: general language model pretraining with autoregressive blank infilling[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, May 22-27, 2022. Stroudsburg: ACL, 2022: 320-335.
[27] SUN Y, WANG S, FENG S, et al. ERNIE 3.0: large-scale knowledge enhanced pre-training for language understanding and generation[EB/OL]. [2024-04-16]. https://arxiv.org/abs/2107.02137.
[28] YANG A, XIAO B, WANG B, et al. Baichuan2: open large-scale language models[EB/OL]. [2024-04-16]. https://arxiv.org/abs/2309.10305.
[29] HU E J, SHEN Y, WALLIS P, et al. LoRA: low-rank adaptation of large language models[EB/OL]. [2024-04-16]. https://arxiv.org/abs/2106.09685.
[30] XU Y, XIE L, GU X, et al. QA-LoRA: quantization-aware low-rank adaptation of large language models[EB/OL]. [2024-04-16]. https://arxiv.org/abs/2309.14717.
[31] ZHANG X, RAJABI N, DUH K, et al. Machine translation with large language models: prompting, few-shot learning, and fine-tuning with QLoRA[C]//Proceedings of the 8th Conference on Machine Translation, Singapore, Dec 6-7, 2023. Stroudsburg: ACL, 2023: 468-481.
[32] ZHAO J, WANG T, ABID W, et al. LoRA Land: 310 fine-tuned LLMs that rival GPT-4, a technical report[EB/OL]. [2024-04-16]. https://arxiv.org/abs/2405.00732.
[33] LI J, LIN Y, ZHAO P, et al. Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT) [J]. BMC Medical Informatics, 2022, 22(1): 200.
[34] DEVLIN J, CHANG M-W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Jun 2-7, 2019. Stroudsburg: ACL, 2019: 4171-4186.
[35] LIU Y, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach[EB/OL]. [2024-04-16]. https://arxiv.org/abs/1907.11692.
[36] SUTSKEVER I, VINYALS O, LE Q. Sequence to sequence learning with neural networks[C]//Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, Montreal, Dec 8-13, 2014: 3104-3112.