Task-Similarity Guided Progressive Deep Neural Network and Its Learning

doi:10.3778/j.issn.1673-9418.2112016

Abstract

Abstract: Continuous learning aims at continuously learn multiple tasks, and can use the knowledge of previous tasks to help the learning of current task without catastrophic forgetting. Progressive neural network is a parameter-independent continuous learning method, gradually assigning extra networks to each task to improve the performance of continuous learning. However, this method can not directly take advantage of similarity influence between tasks. In the continuous learning process, by comparing the similarity between tasks, the performance of current task may be significantly improved by using this to trim and migrate the parameters of previous task. Therefore, a task-similarity guided progressive deep neural network (TSGPNN) and its learning method are proposed, which include two stages: task-similarity evaluation and progressive learning. In the task-similarity evaluation stage, a reference quantity is defined to measure the similarity between target task domains, and served as a reference for the knowledge transfer between tasks. The progressive process improves the ability to learn new tasks by absorbing the knowledge from previous tasks and relearning. After performing task segmentation on CIFAR-100, MNIST-Permutation and MNIST-Rotation datasets, experimental results show that TSGPNN performance is better and more stable than one-task learning, multi-task learning, and other continuous learning methods.

Key words: catastrophic forgetting, continual learning, deep network, progressive neural network

摘要： 持续学习旨在连续地学习多个任务，且在不发生灾难性遗忘的情况下，能够利用先前任务的知识帮助当前任务的学习。渐进神经网络是一种参数独立的持续学习方法，渐进地为每个任务分配额外的网络来提升持续学习的性能，但是这种方法未能直接利用任务间相似度的影响。而在持续学习过程中，通过对比任务间的相似度，并以此对先前任务的参数进行修剪再迁移可能会显著提高当前任务的性能。因此，提出了一种任务相似度引导的渐进深度神经网络（TSGPNN）及其学习方法，它包括了任务相似度评估和渐进学习两个阶段。其中，任务相似度评估阶段定义了一个参照值来衡量目标任务域之间的相似度，并以此作为任务间知识迁移量的参照；渐进过程通过吸收先前任务中的知识重新学习，以此提升学习新任务的能力。对CIFAR-100、MNIST-Permutation和MNIST-Rotation数据集做任务切分，实验表明，TSGPNN的性能与单任务学习、多任务学习和其他基准持续学习方法相比更好、更稳定。

关键词: 灾难性遗忘, 持续学习, 深度网络, 渐进神经网络

WU Chu, WANG Shitong. Task-Similarity Guided Progressive Deep Neural Network and Its Learning[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(5): 1126-1138.

吴楚, 王士同. 任务相似度引导的渐进深度神经网络及其学习[J]. 计算机科学与探索, 2023, 17(5): 1126-1138.

References

[1] WANG L, LEI B, LI Q, et al. Triple memory networks: a brain-inspired method for continual learning[J]. arXiv:2003. 03143, 2020.
[2] FRENCH R M. Catastrophic forgetting in connectionist networks[J]. Trends in Cognitive Sciences, 1999, 3(4): 128-135.
[3] DELANGE M, ALJUNDI R, MASANA M, et al. A continual learning survey: defying forgetting in classification tasks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(7): 3366-3385.
[4] LOPEZ-PAZ D, RANZATO M A. Gradient episodic memory for continual learning[C]//Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, Dec 4-9, 2017: 6467-6476.
[5] LI Z, HOIEM D. Learning without forgetting[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(12): 2935-2947.
[6] KIRKPATRICK J, PASCANU R, RABINOWITZ N, et al. Overcoming catastrophic forgetting in neural networks[J]. Proceedings of the National Academy of Sciences, 2017, 114(13): 3521-3526.
[7] 韩亚楠, 刘建伟, 罗雄麟. 连续学习研究进展[J]. 计算机研究与发展, 2022, 59(6): 1213-1239.
HAN Y N, LIU J W, LUO X L. Progress in continuous learning research[J]. Journal of Computer Research and Development, 2022, 59(6): 1213-1239.
[8] RUSU A A, RABINOWITZ N C, DESJARDINS G, et al. Progressive neural networks[J]. arXiv:1606.04671, 2016.
[9] FAYEK H M, CAVEDON L, WU H R. Progressive learning: a deep learning framework for continual learning[J]. Neural Networks, 2020, 128: 345-357.
[10] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems 25: Proceedings of the 26th Annual Conference on Neural Information Processing Systems, Lake Tahoe, Dec 3-6, 2012: 1097-1105.
[11] GLOROT X, BORDES A, BENGIO Y. Deep sparse rectifier neural networks[C]//Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, Florida, Apr 11-13, 2011: 315-323.
[12] HINTON G E, SRIVASTAVA N, KRIZHEVSKY A, et al. Improving neural networks by preventing co-adaptation of feature detectors[J]. arXiv:1207.0580, 2012.
[13] PARISI G I, TANI J, WEBER C, et al. Lifelong learning of human actions with deep neural network self-organization[J]. Neural Networks, 2017, 96: 137-149.
[14] KE Z X, LIU B, HUANG X C. Continual learning of a mixed sequence of similar and dissimilar tasks[C]//Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, Dec 6-12, 2020.
[15] 卞则康, 王士同. 基于相似度学习的多源迁移算法[J]. 控制与决策, 2017, 32(11): 1941-1948.
BIAN Z K, WANG S T. Multi-source migration algorithm based on similarity learning[J]. Control and Decision, 2017, 32(11): 1941-1948.
[16] 马娜, 温廷新, 贾旭. 具有类间差异约束的多对抗深度域适应模型[J]. 计算机科学与探索, 2023, 17(5): 1168-1179.
MA N, WEN T X, JIA X. Multi-adversarial depth domain adaptation model with inter-class difference constraints[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(5): 1168-1179.
[17] LI J J, CHEN E P, DING Z M, et al. Maximum density divergence for domain adaptation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(11): 3918-3930.
[18] HAN S, POOL J, TRAN J, et al. Learning both weights and connections for efficient neural networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, Dec 7-12, 2015. Cambridge: MIT Press, 2015: 1135-1143.
[19] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, Jul 21-26, 2017. Washington: IEEE Computer Society, 2017: 4700-4708.
[20] 杨悦, 王士同. 随机特征映射的四层神经网络及其增量学习[J]. 计算机科学与探索, 2021, 15(7): 1265-1278.
YANG Y, WANG S T. Four-layer neural network based on random feature mapping and its incremental learning[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(7): 1265-1278.
[21] POLYAK B T. Some methods of speeding up the convergence of iteration methods[J]. USSR Computational Mathematics and Mathematical Physics, 1964, 4(5): 1-17.
[22] ZENKE F, POOLE B, GANGULI S. Continual learning through synaptic intelligence[C]//Proceedings of the 34th International Conference on Machine Learning, Sydney, Aug 6-11, 2017: 3987-3995.
[23] KRIZHEVSKY A, HINTON G. Learning multiple layers of features from tiny images[R]. Toronto: University of Toronto, 2009.
[24] MA J, ZHAO Z, YI X, et al. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, London, Aug 19-23, 2018. New York: ACM, 2018: 1930-1939.