AIGC大模型测评综述：使能技术、安全隐患和应对

doi:10.3778/j.issn.1673-9418.2402023

摘要/Abstract

摘要： 人工智能生成内容（AIGC）模型因出色的内容生成能力，在全球范围内引起了广泛关注与应用。然而AIGC大模型的快速发展也带来了一系列隐患，例如模型生成结果的可解释性、公平性和安全隐私等问题。为了降低不可知风险及其危害，对AIGC大模型进行全面测评变得越来越重要。学术界已经开启了AIGC大模型测评研究，旨在有效应对相关挑战，避免潜在的风险。对AIGC大模型测评研究进行了回顾，并对其进行了综述和分析。对模型测评过程进行概述，内容涵盖模型测评前准备和相应的测评指标，并系统性地整理了现有测评基准。讨论了AIGC大模型在金融、政治和医疗领域的代表性应用及其存在的问题。通过可解释性、公平性、鲁棒性、安全性和隐私性等不同角度深入研究测评方法，对AIGC大模型测评需要关注的新问题进行解构，提出大模型测评新挑战的应对策略。最后探讨了AIGC大模型测评未来面临的挑战，并展望了其发展方向。

关键词: AIGC大模型, 大模型测评, 可解释性, 公平性, 鲁棒性, 安全与隐私保护

Abstract: Artificial intelligence generated content (AIGC) models have attracted widespread attention and application worldwide due to their excellent content generation capabilities. However, the rapid development of AIGC large models also brings a series of hidden dangers, such as concerns about interpretability, fairness, security, and privacy preservation of model-generated content. In order to reduce the unknowable risks and their harms, it becomes more and more important to carry out a comprehensive measurement and evaluation of AIGC large models. Academics have initiated AIGC large model evaluation studies aiming to effectively address the related challenges and avoid potential risks. This paper summarizes and analyzes the AIGC large model evaluation studies. Firstly, an overview of the model evaluation process is provided, covering model evaluation pre-preparation and corresponding measurement indicators, and existing measurement benchmarks are systematically organized. Secondly, the representative applications of the AIGC large model in finance, politics and healthcare and their problems are discussed. Then, the measurement methods are studied in depth through different perspectives, such as interpretability, fairness, robustness, security and privacy, and the new issues that need to be paid attention to AIGC large model evaluation are deconstructed, and the ways to cope with the new challenges of large model evaluation are proposed. Finally, the future challenges of AIGC large model evaluation are discussed, and its future development direction is envisioned.

Key words: AIGC large model, large model evaluation, interpretability, fairness, robustness, security and privacy protection

许志伟, 李海龙, 李博, 李涛, 王嘉泰, 谢学说, 董泽辉. AIGC大模型测评综述：使能技术、安全隐患和应对[J]. 计算机科学与探索, 2024, 18(9): 2293-2325.

XU Zhiwei, LI Hailong, LI Bo, LI Tao, WANG Jiatai, XIE Xueshuo, DONG Zehui. Survey of AIGC Large Model Evaluation: Enabling Technologies, Vulnerabilities and Mitigation[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(9): 2293-2325.

参考文献

[1] HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[C]//Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020: 6840-6851.
[2] KARRAS T, LAINE S, AILA T. A style-based generator architecture for generative adversarial networks[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Pisca-taway: IEEE, 2019: 4401-4410.
[3] ZHANG C, ZHANG C, ZHENG S, et al. A complete survey on generative AI (AIGC): is ChatGPT from GPT-4 to GPT-5 all you need? [EB/OL]. [2024-01-10]. https://arxiv.org/abs/2303.11717.
[4] OPENAI. ChatGPT: optimizing language models for dialogue[EB/OL]. (2022-11-30) [2024-01-17]. https://openai.com/blog/chatgpt.
[5] HUANG F, KWAK H, AN J. Is ChatGPT better than human annotators? Potential and limitations of ChatGPT in explaining implicit hate speech[C]//Companion Proceedings of the ACM Web Conference 2023, Austin, Apr 30-May 4, 2023. New York: ACM, 2023: 294-297.
[6] LI J, TANG T, ZHAO W X, et al. Pre-trained language models for text generation: a survey[J]. ACM Computing Surveys, 2024, 56(9): 230.
[7] FLORIDI L, CHIRIATTI M. GPT-3: its nature, scope, limits, and consequences[J]. Minds and Machines, 2020, 30: 681-694.
[8] OUYANG L, WU J, JIANG X, et al. Training language models to follow instructions with human feedback[C]//Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, New Orleans, Nov 28-Dec 9, 2022: 27730-27744.
[9] QIN C, ZHANG A, ZHANG Z, et al. Is ChatGPT a general-purpose natural language processing task solver?[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, Dec 6-10, 2023. Stroudsburg: ACL, 2023: 1339-1384.
[10] RAO H, LEUNG C, MIAO C. Can ChatGPT assess human personalities? A general evaluation framework[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, Dec 6-10, 2023. Stroudsburg: ACL, 2023: 1184-1194.
[11] YANG X, LI Y, ZHANG X, et al. Exploring the limits of ChatGPT for query or aspect-based text summarization[EB/OL]. [2024-01-12]. https://arxiv.org/abs/2302.08081.
[12] ZUCCON G, KOOPMAN B. Dr ChatGPT tell me what I want to hear: how prompt knowledge impacts health ans-wer correctness[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, Dec 6-10, 2023. Stroudsburg: ACL, 2023: 15012-15022.
[13] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Jun 2-7, 2019. Stroudsburg: ACL, 2019: 4171-4186.
[14] CHOWDHERY A, NARANG S, DEVLIN J, et al. PALM: scaling language modeling with pathways[J]. Journal of Machine Learning Research, 2023, 24: 240.
[15] ZHANG S, ROLLER S, GOYAL N, et al. OPT: open pre-trained transformer language models[EB/OL]. [2024-01-10]. https://arxiv.org/abs/2205.01068.
[16] SCAO T L, FAN A, AKIKI C, et al. BLOOM: a 176B-parameter open-access multilingual language model[EB/OL]. [2024-01-10]. https://arxiv.org/abs/2211.05100.
[17] META AI. Introducing LLaMA: a foundational, 65-billion-parameter large language model[EB/OL]. (2023-01-15) [2024-02-17]. https://ai.facebook.com/blog/large-language-model-llama-meta-ai.
[18] META AI. LLaMA2[EB/OL]. (2023-07-19) [2024-02-17]. https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/.
[19] ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, Jun 18-24, 2022. Piscataway: IEEE, 2022: 10684-10695.
[20] RUSKOV M. Grimm in wonderland: prompt engineering with midjourney to illustrate fairytales[EB/OL]. [2024-01-15]. https://arxiv.org/abs/2302.08961.
[21] OPENAI R. GPT-4 technical report[EB/OL]. [2024-01-15]. https://arxiv.org/abs/2303.08774.
[22] DU Z, QIAN Y, LIU X, et al. GLM: general language model pretraining with autoregressive blank infilling[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, May 22-27, 2022. Stroudsburg: ACL, 2022: 320-335.
[23] Tsinghua University. ChatGLM2[EB/OL]. [2024-02-17]. https://github.com/THUDM/ChatGLM2-6B/.
[24] QIAN C, HAN C, FUNG Y R, et al. CREATOR: disentang-ling abstract and concrete reasonings of large language models through tool creation[EB/OL]. [2024-01-20]. https://arxiv.org/abs/2305.14318.
[25] BAIDU. 文心一言[EB/OL]. (2023-03-16) [2024-02-17]. https://yiyan.baidu.com/.
[26] ALIYUN. 通义大模型[EB/OL]. (2023-09-16) [2024-02-17]. https://tongyi.aliyun.com/.
[27] Fudan University. MOSS[EB/OL]. (2023-04-21) [2024-02-17]. https://moss.fudan.edu.cn/.
[28] MindSpore[EB/OL]. (2023-06-16) [2024-02-17]. https://www.mindspore.cn/largeModel.
[29] 华为云盘古大模型[EB/OL]. (2023-07-16) [2024-02-17]. https://www.huaweicloud.com/special/pangu-ai.html/.Huawei Yun. Pangu large model[EB/OL]. (2023-07-16) [2024-02-17]. https://www.huaweicloud.com/special/pangu-ai.html/.
[30] AZARIA A, MITCHELL T. The internal state of an LLM knows when it??s lying[C]//Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, Dec 2023. Stroudsburg: ACL, 2023: 967-976.
[31] CHIANG C H, LEE H. Can large language models be an alternative to human evaluations?[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, Jul 9-14, 2023. Stroudsburg: ACL, 2023.
[32] GAO M, RUAN J, SUN R, et al. Human-like summarization evaluation with ChatGPT[EB/OL]. [2024-01-10]. https://arxiv.org/abs/2304.02554.
[33] LIN Y T, CHEN Y N. LLM-Eval: unified multi-dimensional automatic evaluation for open-domain conversations with large language models[C]//Proceedings of the 5th Workshop on NLP for Conversational AI, Toronto, Jul 2023. Stroudsburg: ACL, 2023: 47-58.
[34] LIU J, XIA C S, WANG Y, et al. Is your code generated by ChatGPT really correct? Rigorous evaluation of large language models for code generation[C]//Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, New Orleans, Dec 10- 16, 2023.
[35] LIU Y, ITER D, XU Y, et al. G-Eval: NLG evaluation using GPT-4 with better human alignment[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, Dec 6-10, 2023. Stroudsburg: ACL, 2023: 2511-2522.
[36] WANG J, LIANG Y, MENG F, et al. Is ChatGPT a good NLG evaluator? A preliminary study[C]//Proceedings of the 4th New Frontiers in Summarization Workshop, Singapore, Dec 2023. Stroudsburg: ACL, 2023: 1-11.
[37] HENDRYCKS D, BURNS C, BASART S, et al. Measuring massive multitask language understanding[C]//Proceedings of the 9th International Conference on Learning Representations, 2020.
[38] ZHANG X, LI C, ZONG Y, et al. Evaluating the performance of large language models on GAOKAO benchmark[EB/OL]. [2024-01-10]. https://arxiv.org/abs/2305.12474.
[39] HUANG Y, BAI Y, ZHU Z, et al. C-EVAL: a multi-level multi-discipline Chinese evaluation suite for foundation models[C]//Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, New Orleans, Dec 10-16, 2023: 62991-63010.
[40] ZHONG W, CUI R, GUO Y, et al. AGIEval: a human-centric benchmark for evaluating foundation models[C]//Findings of the Association for Computational Linguistics: NAACL 2024, Mexico City, Jun 2024. Stroudsburg: ACL, 2024: 2299-2314.
[41] LI H, ZHANG Y, KOTO F, et al. CMMLU: measuring massive multitask language understanding in Chinese[EB/OL]. [2024-01-20]. https://arxiv.org/abs/2306.09212.
[42] ZHANG W, ALJUNIED S M, GAO C, et al. M3Exam: a multilingual, multimodal, multilevel benchmark for examining large language models[C]//Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, New Orleans, Dec 10-16, 2023: 5484-5505.
[43] SRIVASTAVA A, RASTOG A, RAO A, et al. Beyond the imitation game: quantifying and extrapolating the capabilities of language models[EB/OL]. [2024-01-18]. https://arxiv.org/abs/2206.04615.
[44] LIANG P, BOMMASANI R, LEE T, et al. Holistic evaluation of language models[EB/OL]. [2024-01-18]. https://arxiv.org/abs/2211.09110.
[45] CHANG Y, WANG X, WANG J, et al. A survey on evaluation of large language models[J]. ACM Transactions on Intelligent Systems and Technology, 2024, 15(3): 1-45.
[46] CARVALHO D V, PEREIRA E M, CARDOSO J S. Machine learning interpretability: a survey on methods and metrics[J]. Electronics, 2019, 8(8): 832.
[47] RüPING S. Learning interpretable models[D]. Dortmund: University Dortmund, 2006.
[48] ZHOU J, KHAWAJA M A, LI Z, et al. Making machine learning useable by revealing internal states update-a transparent approach[J]. International Journal of Computational Science and Engineering, 2016, 13(4): 378-389.
[49] CARLINI N, LIU C, ERLINGSSON ú, et al. The secret sharer: evaluating and testing unintended memorization in neural networks[C]//Proceedings of the 28th USENIX Security Symposium, Santa Clara, Aug 14-16, 2019. Berkeley: USENIX Association, 2019: 267-284.
[50] FREDRIKSON M, JHA S, RISTENPART T. Model inversion attacks that exploit confidence information and basic countermeasures[C]//Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, Oct 12-16, 2015. New York: ACM, 2015: 1322-1333.
[51] GANJU K, WANG Q, YANG W, et al. Property inference attacks on fully connected neural networks using permutation invariant representations[C]//Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Toronto, Oct 15-19, 2018. New York: ACM, 2018: 619-633.
[52] SALEM A, BHATTACHARYA A, BACKES M, et al. Updates-Leak: data set inference and reconstruction attacks in online learning[C]//Proceedings of the 29th USENIX Security Symposium. Berkeley: USENIX Association, 2020: 1291-1308.
[53] SHOKRI R, STRONATI M, SONG C, et al. Membership inference attacks against machine learning models[C]//Proceedings of the 2017 IEEE Symposium on Security and Privacy, San Jose, May 22-26, 2017. Piscataway: IEEE, 2017: 3-18.
[54] SONG C, RISTENPART T, SHMATIKOV V. Machine learning models that remember too much[C]//Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, Oct 30-Nov 3, 2017. New York: ACM, 2017: 587-601.
[55] WANG J, HU X, HOU W, et al. On the robustness of ChatGPT: an adversarial and out-of-distribution perspective[J]. IEEE Data Engineering Bulletin, 2024, 47(1): 48-62.
[56] ZHU K, WANG J, ZHOU J, et al. PromptBench: towards evaluating the robustness of large language models on adversarial prompts[EB/OL]. [2024-01-18]. https://arxiv.org/abs/2306.04528.
[57] WILLIG M, ZECEVIC M, DHAMI D S, et al. Causal parrots: large language models may talk causality but are not causal[EB/OL]. [2024-01-18]. https://arxiv.org/abs/2308.13067.
[58] ZHOU K, ZHU Y, CHEN Z, et al. Don??t make your LLM an evaluation benchmark cheater[EB/OL]. [2024-01-18]. https://arxiv.org/abs/2311.01964.
[59] ZHU K, CHEN J, WANG J, et al. DyVal: graph-informed dynamic evaluation of large language models[EB/OL]. [2024-01-18]. https://arxiv.org/abs/2309.17167.
[60] ZHU K, ZHAO Q, CHEN H, et al. PromptBench: a unified library for evaluation of large language models[EB/OL]. [2024-01-18]. https://arxiv.org/abs/2312.07910.
[61] ZHENG L, SHENG Y, CHIANG W, et al. Chatbot Arena: benchmarking LLMs in the wild with ELO ratings[EB/OL]. (2023-05-03) [2024-02-17]. https://lmsys.org/blog/2023-05-03-arena/.
[62] ZHENG L, CHIANG W L, SHENG Y, et al. Judging LLM-as-a-judge with MT-bench and Chatbot Arena[C]//Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing System 2023, New Orleans, Dec 10-16, 2023: 46595-46623.
[63] FU C, CHEN P, SHEN Y, et al. MME: a comprehensive evaluation benchmark for multimodal large language models[EB/OL]. [2024-01-18]. https://arxiv.org/abs/2306.13394.
[64] AN C, GONG S, ZHONG M, et al. L-Eval: instituting standardized evaluation for long context language models[EB/OL]. [2024-01-18]. https://arxiv.org/abs/2307.11088.
[65] YU J, WANG X, TU S, et al. KoLA: carefully benchmarking world knowledge of large language models[EB/OL]. [2024-01-20]. https://arxiv.org/abs/2306.09296.
[66] KIELA D, BARTOLO M, NIE Y, et al. DynaBench: rethinking benchmarking in NLP[C]//Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2021: 4110-4124.
[67] ZHOU Y, MURESANU A I, HAN Z, et al. Large language models are human-level prompt engineers[C]//Proceedings of the 11th International Conference on Learning Representations, Kigali, May 1-5, 2023.
[68] DUBOIS Y, GALAMBOSI B, LIANG P, et al. Length-controlled AlpacaEval: a simple way to debias automatic evaluators[EB/OL]. [2024-01-20]. https://arxiv.org/abs/2404. 04475.
[69] WANG Y, YU Z, ZENG Z, et al. PandaLM: an automatic evaluation benchmark for LLM instruction tuning optimization[EB/OL]. [2024-01-20]. https://arxiv.org/abs/2306.05087.
[70] CHOI M, PEI J, KUMAR S, et al. Do LLMs understand social knowledge? Evaluating the sociability of large language models with SocKET benchmark[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, Dec 6-10, 2023. Stroudsburg: ACL, 2023: 11370-11403.
[71] HENDRYCKS D, BURNS C, KADAVATH S, et al. Measuring mathematical problem solving with the MATH data-set[EB/OL]. [2024-01-20]. https://arxiv.org/abs/2103.03874.
[72] HENDRYCKS D, BASART S, KADAVATH S, et al. Measuring coding challenge competence with apps[EB/OL]. [2024-01-20]. https://arxiv.org/abs/2105.09938.
[73] HUGGINGFACE. Open-source large language models leaderboard[EB/OL]. (2023-01-01) [2024-02-17]. https://huggingface.co/spaces/HuggingFaceH4/open-llm-leaderboard.
[74] 超对称(北京)科技有限公司. BBT CFLEB[EB/OL]. (2023-07-28) [2024-02-17]. https://bbt.ssymmetry.com/evaluation.html.
[75] ZHANG L, CAI W, LIU Z, et al. FineVal: a Chinese financial domain knowledge evaluation benchmark for large language models[EB/OL]. [2024-01-20]. https://arxiv.org/abs/2308.09975.
[76] SINGHAL K, AZIZI S, TU T, et al. Large language models encode clinical knowledge[J]. Nature, 2023, 620: 172-180.
[77] KE P, WEN B, FENG Z, et al. CritiqueLLM: scaling LLM-as-critic for effective and explainable evaluation of large language model generation[EB/OL]. [2024-01-20]. https://arxiv.org/abs/2311.18702.
[78] YANG K, ZHANG T, KUANG Z, et al. MentaLLaMA: interpretable mental health analysis on social media with large language models[C]//Proceedings of the ACM Web Conference 2024, Singapore, May 13-17, 2024. New York: ACM, 2024: 4489-4500.
[79] WANG B, XU C, WANG S, et al. Adversarial GLUE: a multi-task benchmark for robustness evaluation of language models[EB/OL]. [2024-01-20]. https://arxiv.org/abs/2111.02840.
[80] MEI A, LEVY S, WANG W Y. ASSERT: automated safety scenario red teaming for evaluating the robustness of large language models[C]//Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, Dec 2023.Stroudsburg: ACL, 2023: 5831-5847.
[81] XU G, LIU J, YAN M, et al. CValues: measuring the values of Chinese large language models from safety to responsibility[EB/OL]. [2024-01-20]. https://arxiv.org/abs/2307.09705.
[82] LIN H, LUO Z, WANG B, et al. GOAT-Bench: safety insights to large multimodal models through meme-based social abuse[EB/OL]. [2024-01-20]. https://arxiv.org/abs/2401.01523.
[83] XU L, ZHAO K, ZHU L, et al. SC-Safety: a multi-round open-ended question adversarial safety benchmark for large language models in Chinese[EB/OL]. [2024-01-20]. https://arxiv.org/abs/2310.05818.
[84] ZHANG Z, LEI L, WU L, et al. SafetyBench: evaluating the safety of large language models with multiple choice questions[EB/OL]. [2024-01-20]. https://arxiv.org/abs/2309.07045.
[85] LEVY S, ALLAWAY E, SUBBIAH M, et al. SafeText: a benchmark for exploring physical safety in language models[EB/OL]. [2024-01-20]. https://arxiv.org/abs/2210.10045.
[86] CHEN F, HAN M, ZHAO H, et al. X-LLM: bootstrapping advanced large language models by treating multi-modalities as foreign languages[EB/OL]. [2024-01-20]. https://arxiv.org/abs/2305.04160.
[87] GAO P, HAN J, ZHANG R, et al. LLAMA-Adapter v2: parameter-efficient visual instruction model[EB/OL]. [2024-01-20]. https://arxiv.org/abs/2304.15010.
[88] XU Z, SHEN Y, HUANG L. MultiInstruct: improving multi-modal zero-shot learning via instruction tuning[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, Jul 9-14, 2023. Stroudsburg: ACL, 2023: 11445-11465.
[89] ZHANG R, HAN J, ZHOU A, et al. LLAMA-adapter: efficient fine-tuning of language models with zero-init attention[EB/OL]. [2024-01-20]. https://arxiv.org/abs/2303.16199.
[90] ZHAO Z, GUO L, YUE T, et al. ChatBridge: bridging modalities with large language model as a language catalyst[EB/OL]. [2024-01-20]. https://arxiv.org/abs/2305.16103.
[91] GONG T, LYU C, ZHANG S, et al. Multimodal-GPT: a vision and language model for dialogue with humans[EB/OL]. [2024-01-25]. https://arxiv.org/abs/2305.04790.
[92] LI K C, HE Y, WANG Y, et al. VideocHat: Chat-centric video understanding[EB/OL]. [2024-01-25]. https://arxiv.org/abs/2305.06355.
[93] LI L, YIN Y, LI S, et al. M3IT: a large-scale dataset towards multi-modal multilingual instruction tuning[EB/OL]. [2024-01-25]. https://arxiv.org/abs/2306.04387.
[94] QIN Y, CAI Z, JIN D, et al. WebCPM: interactive Web search for Chinese long-form question answering[EB/OL]. [2024-01-25]. https://arxiv.org/abs/2305.06849.
[95] DING N, CHEN Y, XU B, et al. Enhancing Chat language models by scaling high-quality instructional conversations[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, Dec 6-10, 2023. Stroudsburg: ACL, 2023: 3029-3051.
[96] SHI K, WANG X, YU J, et al. CStory: a Chinese large-scale news storyline dataset[C]//Proceedings of the 31st ACM International Conference on Information and Knowledge Management, Atlanta, Oct 17-21, 2022. New York: ACM, 2022: 4475-4479.
[97] DU L, DING X, XIONG K, et al. e-CARE: a new dataset for exploring explainable causal reasoning[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, May 22-27, 2022. Stroudsburg: ACL, 2022: 432-446.
[98] ALLEN Institute for AI. Dolma[EB/OL]. (2023-05-07) [2024-02-17]. https://blog.allenai.org/dolma-3-trillion-tokens-open-llm-corpus-9a0ff4b8da64.
[99] LIU H, LI C, WU Q, et al. Visual instruction tuning[C]//Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, New Orleans, Dec 10-16, 2023.
[100] ZHU D, CHEN J, SHEN X, et al. MinIGPT-4: enhancing vision-language understanding with advanced large language models[EB/OL]. [2024-01-25]. https://arxiv.org/abs/2304.10592.
[101] YANG R, SONG L, LI Y, et al. GPT4Tools: teaching large language model to use tools via self-instruction[C]//Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, New Orleans, Dec 10-16, 2023: 71995-72007.
[102] PI R, GAO J, DIAO S, et al. DetGPT: detect what you need via reasoning[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, Dec 6-10, 2023. Stroudsburg: ACL, 2023: 14172-14189.
[103] LIU P, YUAN W, FU J, et al. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing[J]. ACM Computing Surveys, 2023, 55(9): 195.
[104] LI X, QIU X. MoT: pre-thinking and recalling enable ChatGPT to self-improve with memory-of-thoughts[EB/OL]. [2024-01-25]. https://arxiv.org/abs/2305.05181.
[105] DOSHI-VELEZ F, KIM B. Towards a rigorous science of interpretable machine learning[EB/OL]. [2024-01-25]. https://arxiv.org/abs/1702.08608.
[106] BAYAT V, PHELPS S, RYONO R, et al. A severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) prediction model from standard laboratory tests[J]. Clinical Infectious Diseases, 2021, 73(9): e2901-e2907.
[107] GUIDOTTI R, MONREALE A, RUGGIERI S, et al. A survey of methods for explaining black box models[J]. ACM Computing Surveys, 2018, 51(5): 93.
[108] SAMEK W, MONTAVON G, VEDALDI A, et al. Explainable AI: interpreting, explaining and visualizing deep learning[M]. Cham: Springer, 2019.
[109] RAMESH A, DHARIWAL P, NICHOL A, et al. Hierarchical text-conditional image generation with CLIP latents[EB/OL]. (2022-04-13) [2024-02-17]. https://arxiv.org/abs/ 2204.06125.
[110] REI R, STEWART C, FARINHA A C, et al. COMET: a neural framework for MT evaluation[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2020: 2685-2702.
[111] PIRES T, SCHLINGER E, GARRETTE D. How multilingual is multilingual BERT?[C]//Proceedings of the 57th Conference of the Association for Computational Linguistics (Volume 1: Long Papers), Florence, Jul 28-Aug 2, 2019. Stroudsburg: ACL, 2019: 4996-5001.
[112] SELLAM T, DAS D, PARIKH A P. BLEURT: learning robust metrics for text generation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 7881-7892.
[113] WU S, IRSOY O, LU S, et al. BloombergGPT: a large language model for finance[EB/OL]. [2024-01-25]. https://arxiv.org/abs/2303.17564.
[114] ZHANG X, YANG Q, XU D. XuanYuan 2.0: a large Chinese financial chat model with hundreds of billions parameters[C]//Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, Oct 21-25, 2023. New York: ACM, 2023: 4435-4439.
[115] YU Y M, HONG W H. Cornucopia[EB/OL]. (2023-03-21) [2024-02-17]. https://github.com/jerry1993-tech/Cornucopia- LLaMA-Fin-Chinese.
[116] LU D, WU H, LIANG J, et al. BBT-Fin: comprehensive construction of Chinese financial domain pre-trained language model, corpus and benchmark[EB/OL]. [2024-01-25]. https://arxiv.org/abs/2302.09432.
[117] LexiLaw[EB/OL]. (2023-07-18) [2024-02-17]. https://github.com/CSHaitao/LexiLaw.
[118] NGUYEN H T. A brief report on LawGPT 1.0: a virtual legal assistant based on GPT-3[EB/OL]. [2024-01-25]. https://arxiv.org/abs/2302.05729.
[119] HUANG Q, TAO M, AN Z, et al. Lawyer LLaMA technical report[EB/OL]. [2024-01-25]. https://arxiv.org/abs/2305.15062.
[120] YAO F, XIAO C, WANG X, et al. LEVEN: a large-scale Chinese legal event detection dataset[C]//Findings of the Association for Computational Linguistics: ACL 2022, Dublin, May 22-27, 2022. Stroudsburg: ACL, 2022: 183-201.
[121] CASCELLA M, MONTOMOLI J, BELLINI V, et al. Evaluating the feasibility of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios[J]. Journal of Medical Systems, 2023, 47(1): 33.
[122] CHERVENAK J, LIEMAN H, BLANCO-BREINDEL M, et al. The promise and peril of using a large language model to obtain clinical information: ChatGPT performs strongly as a fertility counseling tool with limitations[J]. Fertility and Sterility, 2023, 120(3): 575-583.
[123] DUONG D, SOLOMON B D. Analysis of large-language model versus human performance for genetics questions[J]. European Journal of Human Genetics, 2024, 32: 466-468.
[124] GILSON A, SAFRANEK C W, HUANG T, et al. How does ChatGPT perform on the United States medical licen-sing examination? The implications of large language models for medical education and knowledge assessment[J]. JMIR Medical Education, 2023, 9(1): e45312.
[125] XIONG H, WANG S, ZHU Y, et al. DoctorGLM: fine-tuning your Chinese doctor is not a herculean task[EB/OL]. [2024-01-25]. https://arxiv.org/abs/2304.01097.
[126] LI S T. Ben-Tsao Gong-Mu (Chinese botanical encyclopedia)[M]. Taipei, China: Great Taipei Publishing, 1990.
[127] LIANG Y, HUANG Y. Bian Que, the founder of diagnostics of traditional Chinese medicine[J]. Journal of Traditional Chinese Medical Sciences, 2022, 9(2): 93-94.
[128] ZHANG H, CHEN J, JIANG F, et al. HuatuoGPT, towards taming language model to be a doctor[C]//Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, Dec 2023. Stroudsburg: ACL, 2023: 10859-10885.
[129] SINGHAL K, TU T, GOTTWEIS J, et al. Towards expert-level medical question answering with large language mo-dels[EB/OL]. [2024-01-25]. https://arxiv.org/abs/2305.09617.
[130] HALL P, GILL N, SCHMIDT N. Proposed guidelines for the responsible use of explainable machine learning[EB/OL]. [2024-01-25]. https://arxiv.org/abs/1906.03533.
[131] 陈珂锐, 孟小峰. 机器学习的可解释性[J]. 计算机研究与发展, 2020, 57(9): 1971-1986.
CHEN K Y, MENG X F. Interpretability of machine learn-ing[J]. Journal of Computer Research and Development, 2020, 57(9): 1971-1986.
[132] 梁峥, 王宏志, 戴加佳, 等. 预训练语言模型实体匹配的可解释性[J]. 软件学报, 2023, 34(3): 1087-1108.
LIANG Z, WANG H Z, DAI J J, et al. Interpretability of entity matching based on pre-trained language model[J]. Journal of Software, 2023, 34(3): 1087-1108.
[133] 王冬丽, 杨珊, 欧阳万里, 等. 人工智能可解释性: 发展与应用[J]. 计算机科学, 2023, 50(S1): 19-25.
WANG D L,YANG S, OUYANG W L, et al. Explainability of artificial intelligence: development and application[J]. Computer Science, 2023, 50(S1): 19-25.
[134] MARKUS A F, KORS J A, RIJNBEEK P R. The role of explainability in creating trustworthy artificial intelligence for health care: a comprehensive survey of the terminology, design choices, and evaluation strategies[J]. Journal of Biomedical Informatics, 2021, 113: 103655.
[135] 纪守领, 李进锋, 杜天宇, 等. 机器学习模型可解释性方法、应用与安全研究综述[J]. 计算机研究与发展, 2019, 56(10): 2071-2096.
JI S L, LI J F, DU T Y, et al. Survey on techniques, applications and security of machine learning interpretability [J]. Journal of Computer Research and Development, 2019, 56(10): 2071-2096.
[136] 成科扬, 王宁, 师文喜, 等. 深度学习可解释性研究进展[J]. 计算机研究与发展, 2020, 57(6): 1208-1217.
CHENG K Y, WANG N, SHI W X, et al. Research advances in the interpretability of deep learning [J]. Journal of Computer Research and Development, 2020, 57(6): 1208-1217.
[137] DOSHI-VELEZ F, KIM B. Considerations for evaluation and generalization in interpretable machine learning[M]// Explainable and Interpretable Models in Computer Vision and Machine Learning. Cham: Springer, 2018: 3-17.
[138] LIPTON Z C. In machine learning, the concept of interpretability is both important and slippery[J]. Queue, 2018, 16: 28.
[139] SHEN Y, WANG L, CHEN Y, et al. An interpretability evaluation benchmark for pre-trained language models[EB/OL]. [2024-01-25]. https://arxiv.org/abs/2207.13948.
[140] ROSS A, CHEN N, HANG E Z, et al. Evaluating the interpretability of generative models by interactive reconstruction[C]//Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, Yokohama, May 8-13, 2021. New York: ACM, 2021: 80.
[141] WANG Q, ANIKINA T, FELDHUS N, et al. LLMCheckup: conversational examination of large language models via interpretability tools[EB/OL]. [2024-01-20]. https://arxiv.org/abs/2401.12576.
[142] LEI Y, LIAN J, YAO J, et al. RecExplainer: aligning large language models for recommendation model interpretability[EB/OL]. [2024-01-25]. https://arxiv.org/abs/2311.10947.
[143] YANG K, JI S, ZHANG T, et al. Towards interpretable mental health analysis with large language models[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, Dec 6-10, 2023. Stroudsburg: ACL, 2023: 6056-6077.
[144] MA W, ZHAO M, XIE X, et al. Are code pre-trained models powerful to learn code syntax and semantics?[EB/OL]. [2024-01-25]. https://arxiv.org/abs/2212.10017.
[145] LI Y, ZHANG T, LUO X, et al. Do pre-trained language models indeed understand software engineering tasks?[J]. IEEE Transactions on Software Engineering, 2023, 49(10): 4639-4655.
[146] HOODA A, CHRISTODORESCU M, ALLAMANIS M, et al. Do large code models understand programming concepts? A black-box approach[EB/OL]. [2024-01-25]. https://arxiv.org/abs/2402.05980.
[147] RODRIGUEZ-CARDENAS D, PALACIO D N, KHATI D, et al. Benchmarking causal study to interpret large language models for source code[C]//Proceedings of the 2023 IEEE International Conference on Software Maintenance and Evolution, Bogotá, Oct 1-6, 2023. Piscataway: IEEE, 2023: 329-334.
[148] ROY S, LABERGE G, ROY B, et al. Why don’t XAI techniques agree? Characterizing the disagreements bet-ween post-hoc explanations of defect predictions[C]//Proceedings of the 2022 IEEE International Conference on Software Maintenance and Evolution, Limassol, Oct 3-7, 2022. Piscataway: IEEE, 2022: 444-448.
[149] JI Z, MA P, LI Z, et al. Benchmarking and explaining large language model-based code generation: a causality-centric approach[EB/OL]. [2024-01-25]. https://arxiv.org/abs/2310. 06680.
[150] PALACIO D N, VELASCO A, RODRIGUEZ-CARDENAS D, et al. Evaluating and explaining large language models for code using syntactic structures[EB/OL]. [2024-01-31]. https://arxiv.org/abs/2308.03873.
[151] ZHANG T, CHEN Z, ZHU Y, et al. Interpretable program synthesis[C]//Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, Yokohama, May 8-13, 2021. New York: ACM, 2021.
[152] 杨朋波, 桑基韬, 张彪, 等. 面向图像分类的深度模型可解释性研究综述[J]. 软件学报, 2023, 34(1): 230-254.
YANG P B, SANG J T, ZHANG B, et al. Survey on interpretability of deep models for image classification[J]. Journal of Software, 2023, 34(1): 230-254.
[153] CHEN H, JI Y. Adversarial training for improving model robustness? Look at both prediction and interpretation[C]//Proceedings of the 36th AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2022: 10463-10472.
[154] 王昱颖, 张敏, 杨晶然, 等. 深度学习模型中的公平性研究[J]. 软件学报, 2023, 34(9): 4037-4055.
WANG Y Y, ZHANG M, YANG J R, et al. Research on fairness in deep learning models[J]. Journal of Software,2023,34(9): 4037-4055.
[155] 刘文炎, 沈楚云, 王祥丰, 等. 可信机器学习的公平性综述[J]. 软件学报, 2021, 32(5): 1404-1426.
LIU W Y, SHEN C Y, WANG X F, et al. Survey on fairness in trustworthy machine learning[J]. Journal of Software, 2021, 32(5): 1404-1426.
[156] ZHOU Y, MURESANU A I, HAN Z, et al. Large language models are human-level prompt engineers[EB/OL]. [2024-01-31]. https://arxiv.org/abs/2211.01910.
[157] SAH C K, XIAOLI D L, ISLAM M M. Unveiling bias in fairness evaluations of large language models: a critical literature review of music and movie recommendation systems[EB/OL]. [2024-01-25]. https://arxiv.org/abs/2401.04057.
[158] BI G, SHEN L, XIE Y, et al. A group fairness lens for large language models[EB/OL]. [2024-01-31]. https://arxiv. org/abs/2312.15478.
[159] FREIBERGER V, BUCHMANN E. Fairness certification for natural language processing and large language models[EB/OL]. [2024-01-31]. https://arxiv.org/abs/2401.01262.
[160] HUANG P S, ZHANG H, JIANG R, et al. Reducing sentiment bias in language models via counterfactual evaluation[C]//Findings of the Association for Computational Linguistics: EMNLP 2020. Stroudsburg: ACL, 2020: 65-83.
[161] ZHUO T Y, HUANG Y, CHEN C, et al. Exploring ai ethics of ChatGPT: a diagnostic analysis[EB/OL]. [2024-01-31]. https://arxiv.org/abs/2301.12867.
[162] FERRARA E. Should ChatGPT be biased? Challenges and risks of bias in large language models[EB/OL]. [2024-01-31]. https://arxiv.org/abs/2304.03738.
[163] HARTMANN J, SCHWENZOW J, WITTE M. The political ideology of conversational AI: converging evidence on ChatGPT??s pro-environmental, left-libertarian orientation[EB/OL]. [2024-01-31]. https://arxiv.org/abs/2301.01768.
[164] LI Y, ZHANG Y. Fairness of ChatGPT[EB/OL]. [2024-01-31]. https://arxiv.org/abs/2305.18569.
[165] PARRISH A, CHEN A, NANGIA N, et al. BBQ: a hand-built bias benchmark for question answering[C]//Findings of the Association for Computational Linguistics: ACL 2022, Dublin, May 22-27, 2022. Stroudsburg: ACL, 2022: 2086-2105.
[166] KHASHABI D, MIN S, KHOT T, et al. UnifiedQA: crossing format boundaries with a single QA system[C]//Findings of the Association for Computational Linguistics: EMNLP 2020. Stroudsburg: ACL, 2020: 1896-1907.
[167] RUTINOWSKI J, FRANKE S, ENDENDYK J, et al. The self-perception and political biases of ChatGPT[EB/OL]. [2024-01-31]. https://arxiv.org/abs/2304.07333.
[168] FERREIRA S L C, CAIRES A O, BORGES T S, et al. Robustness evaluation in analytical methods optimized using experimental designs[J]. Microchemical Journal, 2017, 131: 163-169.
[169] BRENDEL W, RAUBER J, KüMMERER M, et al. Accurate, reliable and fast robustness evaluation[C]//Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, Vancouver, Dec 8-14, 2019: 12841-12851.
[170] KASHYAP A R, MEHNAZ L, MALIK B, et al. Analyzing the domain robustness of pretrained language models, layer by layer[C]//Proceedings of the 2nd Workshop on Domain Adaptation for NLP. Stroudsburg: ACL, 2021: 222-244.
[171] LIU Q, JI S, LIU C, et al. A practical black-box attack on source code authorship identification classifiers[J]. IEEE Transactions on Information Forensics and Security, 2021, 16: 3620-3633.
[172] ZHANG C, WANG Z, MANGAL R, et al. Transfer attacks and defenses for large language models on coding tasks[EB/OL]. [2024-01-31]. https://arxiv.org/abs/2311.13445.
[173] LI Z, PENG B, HE P, et al. Evaluating the instruction-following robustness of large language models to prompt injection[EB/OL]. [2024-01-31]. https://arxiv.org/abs/2308.10819.
[174] LI Y, GUO Y, GUERIN F, et al. Evaluating large language models for generalization and robustness via data compression[EB/OL]. [2024-01-31]. https://arxiv.org/abs/2402.00861.
[175] QIU H, ZHANG S, LI A, et al. Latent jailbreak: a benchmark for evaluating text safety and output robustness of large language models[EB/OL]. [2024-01-31]. https://arxiv. org/abs/2307.08487.
[176] ZHAO Y, PANG T, DU C, et al. On evaluating adversarial robustness of large vision-language models[C]//Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, New Orleans, Dec 10-16, 2023: 54111-54138.
[177] LI Z, QIU W, MA P, et al. An empirical study on large language models in accuracy and robustness under Chinese industrial scenarios[EB/OL]. [2024-01-31]. https://arxiv.org/ abs/2402.01723.
[178] NEUMANN P G. Computer system-security evaluation[C]//Proceedings of the 1978 International Workshop on Managing Requirements Knowledge. Washington: IEEE Computer Society, 1978: 1087.
[179] TOUBIANA V, NARAYANAN A, BONEH D, et al. Adnostic: privacy preserving targeted advertising[C]//Proceedings of the 2010 Network and Distributed System Symposium, San Diego, 2010.
[180] PLANT R, GIUFFRIDA V, GKATZIA D. You are what you write: preserving privacy in the era of large language models[EB/OL]. [2024-01-31]. https://arxiv.org/abs/2204. 09391.
[181] HARDT M, TALWAR K. On the geometry of differential privacy[C]//Proceedings of the 42nd ACM Symposium on Theory of Computing, Cambridge, Jun 5-8, 2010. New York: ACM, 2010: 705-714.
[182] ZHANG C, WANG Z, MANGAL R, et al. Transfer attacks and defenses for large language models on coding tasks[EB/OL]. [2024-01-31]. https://arxiv.org/abs/2311.13445.
[183] FENG R, YAN Z, PENG S, et al. Automated detection of password leakage from public github repositories[C]//Proceedings of the 44th International Conference on Software Engineering, Pittsburgh, May 25-27, 2022. New York: ACM, 2022: 175-186.
[184] JUNGWIRTH G, SAHA A, SCHR?DER M, et al. Connecting the .dotfiles: checked-in secret exposure with extra (lateral movement) steps[C]//Proceedings of the 2023 IEEE/ACM 20th International Conference on Mining Software Repositories, Melbourne, May 15-16, 2023. Piscataway: IEEE, 2023: 322-333.
[185] VATS A, LIU Z, SU P, et al. Recovering from privacy-preserving masking with large language models[C]//Proceedings of the 2024 IEEE International Conference on Acoustics, Speech and Signal Processing, Seoul, Apr 14-19, 2024. Piscataway: IEEE, 2024: 10771-10775.
[186] ABBASIAN M, AZIMI I, RAHMANI A M, et al. Conversational health agents: a personalized LLM-powered agent framework[EB/OL]. [2024-02-02]. https://arxiv.org/abs/2310.02374.
[187] FRIED D, AGHAJANYAN A, LIN J, et al. InCoder: a generative model for code infilling and synthesis[EB/OL]. [2024-02-02]. https://arxiv.org/abs/2204.05999.
[188] ALLAL L B, LI R, KOCETKOV D, et al. SantaCoder: don??t reach for the stars![EB/OL]. [2024-02-02]. https://arxiv.org/abs/2301.03988.
[189] LI R, ALLAL L B, ZI Y, et al. StarCoder: may the source be with you![EB/OL]. [2024-02-02]. https://arxiv.org/abs/2305.06161.
[190] LYU C, XU J, WANG L. New trends in machine translation using large language models: case examples with ChatGPT[EB/OL]. [2024-02-02]. https://arxiv.org/abs/2305.01181.
[191] ZHANG D, LI S, ZHANG X, et al. SpeechGPT: empowering large language models with intrinsic cross-modal conversational abilities[C]//Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, Dec 2023. Stroudsburg: ACL, 2023: 15757-15773.
[192] WU W, JIANG C, JIANG Y, et al. Do PLMs know and understand ontological knowledge?[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, Jul 9-14, 2023. Stroudsburg: ACL, 2023: 3080-3101.
[193] YIN Z, SUN Q, GUO Q, et al. Do large language models know what they don??t know?[C]//Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Jul 9-14, 2023. Stroudsburg: ACL, 2023: 8653-8665.
[194] CHARAN P V, CHUNDURI H, ANAND P M, et al. From text to MITRE techniques: exploring the malicious use of large language models for generating cyber attack payloads[EB/OL]. [2024-02-02]. https://arxiv.org/abs/2305.15336.
[195] DERNER E, BATISTI? K. Beyond the safeguards: exploring the security risks of ChatGPT[EB/OL]. [2024-02-02]. https://arxiv.org/abs/2305.08005.
[196] DASH B, SHARMA P. Are ChatGPT and deepfake algorithms endangering the cybersecurity industry? A review[J]. International Journal of Engineering and Applied Sciences, 2023. DOI:10.31873/IJEAS.10.1.01.
[197] TSIGARIS P, TEIXEIRA DA SILVA J A. Can ChatGPT be trusted to provide reliable estimates?[J]. Accountability in Research, 2023. DOI: 10.1080/08989621.2023.2179919.