CIL-LLM: Incremental Learning Framework Based on Large Language Models for Category Classification

doi:10.3778/j.issn.1673-9418.2408049

Abstract

Abstract: To enhance classification accuracy in class-incremental learning (CIL) models for text classification and mitigate the issue of catastrophic forgetting, this paper introduces a CIL framework based on a large language model (CIL-LLM). The CIL-LLM framework selects representative samples through sampling and compression, and leverages the strong contextual learning abilities of the LLM to distill key skills, which serve as the basis for classification, thereby reducing storage costs. Keywords matching is used to select optimal skills, which are then formulated into prompts that guide downstream weak LLM in classification, improving accuracy. Through skill fusion based on knowledge distillation, the framework effectively expands and updates the skill repository while ensuring the learning of both new and old categories. The comparative experimental results show that, in tests on the THUCNews dataset, the CIL-LLM framework improves the average accuracy by 6.3 percentage points and reduces the performance degradation rate by 3.1 percentage points compared with the existing L-SCL method. Additionally, in the ablation experiments, the SLEICL model enhanced by the CIL-LLM framework shows an increase in average accuracy of 10.4 percentage points and a reduction in performance degradation rate of 3.3 percentage points compared with the original model. These results further validate that sample compression, keyword matching, and skill fusion all contribute to optimizing the accuracy and reducing performance degradation in the model.

Key words: class-incremental learning, large language model (LLM), thematic classification, knowledge distillation

摘要： 在文本分类领域，为了提升类别增量学习模型的分类准确率并避免灾难性遗忘问题，提出了一种基于大语言模型（LLM）的类别增量学习框架（CIL-LLM）。CIL-LLM框架通过抽样和压缩环节选取具有代表性的样本，利用较强语言理解能力的LLM基于上下文学习提炼关键技能，以这些技能作为分类的依据，从而降低了存储成本；采用关键词匹配环节选取最优技能，以此构建提示词，引导下游弱LLM进行分类，提高了分类的准确性；根据基于知识蒸馏的技能融合环节，不仅实现了技能库的有效拓展和更新，还兼顾了新旧类别特性的学习。对比实验结果表明，在THUCNews数据集上的测试中，与现有的L-SCL方法相比，CIL-LLM框架在所有任务上的平均准确率提升了6.3个百分点，性能下降率降低了3.1个百分点。此外，在消融实验中，经由CIL-LLM框架增强的SLEICL模型相比于原有模型，所有任务的平均准确率提高了10.4个百分点，性能下降率降低了3.3个百分点。消融实验进一步验证了提出的样本压缩、关键词匹配和技能融合环节均对模型的准确率和性能下降率产生了优化效果。

关键词: 类别增量学习, 大语言模型（LLM）, 主题分类, 知识蒸馏

WANG Xiaoyu, LI Xin, HU Mianning, XUE Di. CIL-LLM: Incremental Learning Framework Based on Large Language Models for Category Classification[J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(2): 374-384.

王晓宇, 李欣, 胡勉宁, 薛迪. 基于大语言模型的CIL-LLM类别增量学习框架[J]. 计算机科学与探索, 2025, 19(2): 374-384.

References

[1] ZHANG Y J, ZHAO P, MA L, et al. An unbiased risk estimator for learning with augmented classes[C]//Advances in Neural Information Processing Systems 33, 2020: 10247-10258.
[2] XU M, GUO L Z. Learning from group supervision: the impact of supervision deficiency on multi-label learning[J]. Science China Information Sciences, 2021, 64(3): 130101.
[3] MCCLOSKEY M, COHEN N J. Catastrophic interference in connectionist networks: the sequential learning problem[M]//Psychology of learning and motivation. Elsevier, 1989: 109-165.
[4] WANG Z R, MEHTA S V, POCZOS B, et al. Efficient meta lifelong-learning with limited memory[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2020: 535-548.
[5] LOPEZ-PAZ D, RANZATO M A. Gradient episodic memory for continual learning[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 6467-6476.
[6] KIRKPATRICK J, PASCANU R, RABINOWITZ N, et al. Overcoming catastrophic forgetting in neural networks[J]. Proceedings of the National Academy of Sciences, 2017, 114(13): 3521-3526.
[7] YAN S, XIE J, HE X. DER: dynamically expandable representation for class incremental learning[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 3013-3022.
[8] WU Y, CHEN Y, WANG L, et al. Large scale incremental learning[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 374-382.
[9] YIN W P, LI J, XIONG C M. ConTinTin: continual learning from task instructions[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2022: 3062-3072.
[10] WANG J, DONG D, SHOU L, et al. Effective continual learning for text classification with lightweight snapshots[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2023, 37(8): 10122-10130.
[11] CHEN D, SONG S, YU Q, et al. Grimoire is all you need for enhancing large language models[EB/OL]. [2024-07-22]. https://arxiv.org/abs/2401.03385.
[12] LIU P F, QIU X P, HUANG X J. Recurrent neural network for text classification with multi-task learning[C]//Proceedings of the 25th International Joint Conference on Artificial Intelligence. Menlo Park: AAAI, 2016: 2873-2879.
[13] CONNEAU A, SCHWENK H, BARRAULT L, et al. Very deep convolutional networks for text classification[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics.Stroudsburg: ACL, 2017: 1107-1116.
[14] YAO L, MAO C S, LUO Y. Graph convolutional networks for text classification[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 905.
[15] JOULIN A, GRAVE E, BOJANOWSKI P, et al. Bag of tricks for efficient text classification[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg: ACL, 2017: 427-431.
[16] SCHICK T, SCHÜTZE H. Exploiting cloze-questions for few-shot text classification and natural language inference[C]//Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics.Stroudsburg: ACL, 2021: 255-269.
[17] HAN X, ZHAO W, DING N, et al. PTR: prompt tuning with rules for text classification[J]. AI Open, 2022, 3: 182-192.
[18] SHI W J, MICHAEL J, GURURANGAN S, et al. Nearest neighbor zero-shot inference[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2022: 3254-3265.
[19] DE MASSON D’ AUTUME C, RUDER S, KONG L, et al. Episodic memory in lifelong language learning[C]//Advances in Neural Information Processing Systems 32, Vancouver,Dec 8-14, 2019: 13122-13131.
[20] ZHOU D W, YE H J, ZHAN D C. Learning placeholders for open-set recognition[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 4399-4408.
[21] ZHAO B, XIAO X, GAN G, et al. Maintaining discrimination and fairness in class incremental learning[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 13205-13214.
[22] GPT-4 turbo preview: exploring the 128k context window[EB/OL]. [2024-07-22]. https://povio.com/blog/gpt-4-turbo-preview-exploring-the-128k-context-window/.
[23] PAN Z, WU Q, JIANG H, et al. LLMLingua-2: data distillation for efficient and faithful task-agnostic prompt compression[EB/OL]. [2024-07-22]. http://arxiv.org/abs/2403.12968.
[24] RUBIN O, HERZIG J, BERANT J. Learning to retrieve prompts for in-context learning[C]//Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.Stroudsburg: ACL, 2022: 2655-2671.
[25] ZHOU Y, MURESANU A I, HAN Z, et al. Large language models are human-level prompt engineers[EB/OL]. [2024-08-13]. http://arxiv.org/abs/2211.01910.
[26] KOJIMA T, GU S S, REID M, et al. Large language models are zero-shot reasoners[EB/OL]. [2024-08-13]. http://arxiv.org/abs/2205.11916.
[27] GLM T, ZENG A, XU B, et al. ChatGLM: a family of large language models from GLM-130B to GLM-4 all tools[EB/OL]. [2024-07-22]. http://arxiv.org/abs/2406.12793.
[28] Papers with Code-THUCNews dataset[EB/OL]. [2024-07-22]. https://paperswithcode.com/dataset/thucnews.
[29] LI Z, HOIEM D. Learning without forgetting[J]. IEEE Trans-actions on Pattern Analysis and Machine Intelligence, 2018, 40(12): 2935-2947.
[30] CHAUDHRY A, DOKANIA P K, AJANTHAN T, et al. Riemannian walk for incremental learning: understanding forgetting and intransigence[C]//Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 556-572.
[31] ALJUNDI R, BABILONI F, ELHOSEINY M, et al. Memory aware synapses: learning what (not) to forget[C]//Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 144-161.
[32] HOULSBY N, GIURGIU A, JASTRZEBSKI S, et al. Parameter-efficient transfer learning for NLP[C]//Proceedings of the 36th International Conference on Machine Learning, Long Beach, Jun 9-15, 2019: 2790-2799.
[33] PFEIFFER J, VULIĆ I, GUREVYCH I, et al. MAD-X: an adapter-based framework for multi-task cross-lingual transfer[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2020: 7654-7673.
[34] PFEIFFER J, KAMATH A, RÜCKLÉ A, et al. AdapterFusion: non-destructive task composition for transfer learning[C]//Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics.Stroudsburg: ACL, 2021: 487-503.