[1] HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[EB/OL]. [2024-05-24]. https://arxiv.org/ abs/1503.02531.
[2] KAMALLOO E, DZIRI N, CLARKE C L A, et al. Evaluating open-domain question answering in the era of large language models[EB/OL]. [2024-05-24]. https://arxiv.org/abs/2305.06984.
[3] LI S H, YANG C, YIN Y C, et al. AutoConv: automatically generating information-seeking conversations with large language models[EB/OL]. [2024-05-24]. https://arxiv.org/abs/2308.06507.
[4] BENTIVOGLI L, CLARK P, DAGAN I, et al. The fifth PASCAL recognizing textual entailment challenge[C]//Proceedings of the 2nd Text Analysis Conference, 2009.
[5] XU M F, LI J, LIU Y Y. PPKD: privacy-preserving knowledge distillation for large model[C]//Proceedings of the 2023 International Conference on Networking and Network Applications. Piscataway: IEEE, 2023: 490-496.
[6] LIU C, TAO C Y, FENG J Z, et al. Multi-granularity structural knowledge distillation for language model compression[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2022: 1001-1011.
[7] PARK G, KIM G, YANG E. Distilling linguistic context for language model compression[EB/OL]. [2024-05-24]. https://arxiv.org/abs/2109.08359.
[8] WANG C L, LU Y, MU Y Y, et al. Improved knowledge distillation for pre-trained language models via knowledge selection[EB/OL]. [2024-05-24]. https://arxiv.org/abs/2302.00444.
[9] DASGUPTA S, COHN T, BALDWIN T. Cost-effective distillation of large language models[C]//Findings of the Association for Computational Linguistics: ACL 2023. Stroudsburg: ACL, 2023: 7346-7354.
[10] HSIEH C Y, LI C L, YEH C K, et al. Distilling step-by-step! Outperforming larger language models with less training data and smaller model sizes[EB/OL]. [2024-05-24]. https://arxiv.org/abs/2305.02301.
[11] SUN S Q, CHENG Y, GAN Z, et al. Patient knowledge distillation for BERT model compression[EB/OL]. [2024-05-24]. https://arxiv.org/abs/1908.09355.
[12] SULTAN A. Knowledge distillation ≈ label smoothing: fact or fallacy?[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2023: 4469-4477.
[13] MULLER R, KORNBLITH S, HINTON G E. When does label smoothing help?[C]//Advances in Neural Information Processing Systems 32, 2019: 4696-4705.
[14] YUAN L, TAY F E, LI G L, et al. Revisiting knowledge distillation via label smoothing regularization[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 3902-3910.
[15] PHUONG M, LAMPERT C H. Towards understanding know-ledge distillation[EB/OL]. [2024-05-24]. https://arxiv.org/abs/2105.13093.
[16] LEE H S, WALLRAVEN C. Visualizing the embedding space to explain the effect of knowledge distillation[C]//Proceedings of the 6th Asian Conference on Pattern Recognition. Cham:Springer, 2022: 462-475.
[17] WEI J X, SUN L Z, LENG Y C, et al. Sentence-level or token-level? A comprehensive study on knowledge distillation[C]//Proceedings of the 33rd International Joint Conference on Artificial Intelligence. Palo Alto: AAAI, 2024: 6531-6540.
[18] ZHANG S M, LIANG Y L, WANG S B, et al. Towards understanding and improving knowledge distillation for neural machine translation[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2023: 8062-8079.
[19] LOPEZ-PAZ D, BOTTOU L, SCH?LKOPF B, et al. Unifying distillation and privileged information[C]//Proceedings of the 4th International Conference on Learning Representations, 2016.
[20] TANG J, SHIVANNA R, ZHAO Z, et al. Understanding and improving knowledge distillation[EB/OL]. [2024-05-24]. https://arxiv.org/abs/2002.03532.
[21] LIANG R, LI T, LI L, et al. Knowledge consistency between neural networks and beyond[C]//Proceedings of the 8th International Conference on Learning Representations,2020.
[22] FURLANELLO T, LIPTON Z, TSCHANNEN M, et al. Born again neural networks[C]//Proceedings of the 35th International Conference on Machine Learning, 2018: 1607-1616.
[23] MENON A K, RAWAT A S, REDDI S J, et al. Why distillation helps: a statistical perspective[EB/OL]. [2024-05-24]. https://arxiv.org/abs/2005.10419.
[24] XUE M Q, SONG J, WANG X C, et al. KDExplainer: a task-oriented attention model for explaining knowledge distillation[C]//Proceedings of the 30th International Joint Conference on Artificial Intelligence. Palo Alto: AAAI, 2021: 3228-3234.
[25] ALHARBI R, VU M N, THAI M T. Learning interpretation with explainable knowledge distillation[C]//Proceedings of the 2021 IEEE International Conference on Big Data. Piscataway: IEEE, 2021: 705-714.
[26] ZHANG Q S, CHENG X, CHEN Y L, et al. Quantifying the knowledge in a DNN to explain knowledge distillation for classification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(4): 5099-5113.
[27] KIM J, YOU J, LEE D, et al. Do topological characteristics help in knowledge distillation?[C]//Proceedings of the 41st International Conference on Machine Learning, 2024.
[28] ZHANG C, LI Q C, HUA L Y, et al. Assessing the memory ability of recurrent neural networks[C]//Proceedings of the 24th European Conference on Artificial Intelligence, 2020: 1658-1665.
[29] LIU M, BAO Y, ZHAO C Q, et al. Selective knowledge distillation for non-autoregressive neural machine translation[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2023, 37(11): 13246-13254.
[30] ZHANG C, CAO J X, YAN D M, et al. Which words pillar the semantic expression of a sentence?[C]//Proceedings of the 2023 IEEE 35th International Conference on Tools with Artificial Intelligence. Piscataway: IEEE, 2023: 791-798.
[31] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[EB/OL]. [2024-05-26]. https://arxiv.org/abs/1810.04805.
[32] TOUVRON H, MARTIN L, STONE K, et al. Llama 2: open foundation and fine-tuned chat models[EB/OL]. [2024-05-26]. https://arxiv.org/abs/2307.09288.
[33] WANG A, SINGH A, MICHAEL J, et al. GLUE: a multi-task benchmark and analysis platform for natural language understanding[EB/OL]. [2024-05-26]. https://arxiv.org/abs/1804.07461.
[34] SOCHER R, PERELYGIN A, WU J, et al. Recursive deep models for semantic compositionality over a sentiment treebank[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2013: 1631-1642.
[35] WARSTADT A, SINGH A, BOWMAN S R. Neural network acceptability judgments[J]. Transactions of the Association for Computational Linguistics, 2019, 7: 625-641.
[36] DOLAN B, BROCKETT C. Automatically constructing a corpus of sentential paraphrases[C]//Proceedings of the 3rd International Workshop on Paraphrasing, 2005.
[37] CER D, DIAB M, AGIRRE E, et al. SemEval-2017 task 1: semantic textual similarity - multilingual and cross-lingual focused evaluation[EB/OL]. [2024-05-26]. https://arxiv.org/abs/1708.00055.
[38] SHARMA L, GRAESSER L, NANGIA N, et al. Natural language understanding with the quora question pairs dataset[EB/OL]. [2024-05-26]. https://arxiv.org/abs/1907.01041.
[39] WILLIAMS A, NANGIA N, BOWMAN S R. A broad-coverage challenge corpus for sentence understanding through inference[EB/OL]. [2024-05-26]. https://arxiv.org/abs/1704. 05426.
[40] RAJPURKAR P, ZHANG J, LOPYREV K, et al. SQuAD: 100, 000+ questions for machine comprehension of text[EB/OL]. [2024-05-26]. https://arxiv.org/abs/1606.05250.
[41] HAIM R B, DAGAN I, DOLAN B, et al. The second Pascal recognising textual entailment challenge[C]//Proceedings of the 2nd PASCAL Challenges Workshop on Recognising Textual Entailment, 2006: 785-794.
[42] LEVESQUE H J, DAVIS E, MORGENSTERN L. The winograd schema challenge[C]//Proceedings of the 2012 International Conference on Knowledge Representation and Reasoning, 2012: 552-561. |