Journal of Frontiers of Computer Science and Technology ›› 2024, Vol. 18 ›› Issue (9): 2293-2325.DOI: 10.3778/j.issn.1673-9418.2402023

• Special Issue on Constructions and Applications of Large Language Models in Specific Domains • Previous Articles     Next Articles

Survey of AIGC Large Model Evaluation: Enabling Technologies, Vulnerabilities and Mitigation

XU Zhiwei, LI Hailong, LI Bo, LI Tao, WANG Jiatai, XIE Xueshuo, DONG Zehui   

  1. 1. Haihe Laboratory of Information Application Innovation, Tianjin 300350, China
    2. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China
    3. College of Data Science and Application, Inner Mongolia University of Technology, Hohhot 010080, China
    4. College of Computer Science, Nankai University, Tianjin 300350, China
    5. OPPO Research Institute, Beijing 100026, China
  • Online:2024-09-01 Published:2024-09-01

AIGC大模型测评综述:使能技术、安全隐患和应对

许志伟,李海龙,李博,李涛,王嘉泰,谢学说,董泽辉   

  1. 1. 先进计算与关键软件(信创)海河实验室,天津 300350
    2. 中国科学院 计算技术研究所,北京 100080
    3. 内蒙古工业大学 数据科学与应用学院,呼和浩特 010080
    4. 南开大学 计算机学院,天津 300350
    5. OPPO研究院,北京 100026

Abstract: Artificial intelligence generated content (AIGC) models have attracted widespread attention and application worldwide due to their excellent content generation capabilities. However, the rapid development of AIGC large models also brings a series of hidden dangers, such as concerns about interpretability, fairness, security, and privacy preservation of model-generated content. In order to reduce the unknowable risks and their harms, it becomes more and more important to carry out a comprehensive measurement and evaluation of AIGC large models. Academics have initiated AIGC large model evaluation studies aiming to effectively address the related challenges and avoid potential risks. This paper summarizes and analyzes the AIGC large model evaluation studies. Firstly, an overview of the model evaluation process is provided, covering model evaluation pre-preparation and corresponding measurement indicators, and existing measurement benchmarks are systematically organized. Secondly, the representative applications of the AIGC large model in finance, politics and healthcare and their problems are discussed. Then, the measurement methods are studied in depth through different perspectives, such as interpretability, fairness, robustness, security and privacy, and the new issues that need to be paid attention to AIGC large model evaluation are deconstructed, and the ways to cope with the new challenges of large model evaluation are proposed. Finally, the future challenges of AIGC large model evaluation are discussed, and its future development direction is envisioned.

Key words: AIGC large model, large model evaluation, interpretability, fairness, robustness, security and privacy protection

摘要: 人工智能生成内容(AIGC)模型因出色的内容生成能力,在全球范围内引起了广泛关注与应用。然而AIGC大模型的快速发展也带来了一系列隐患,例如模型生成结果的可解释性、公平性和安全隐私等问题。为了降低不可知风险及其危害,对AIGC大模型进行全面测评变得越来越重要。学术界已经开启了AIGC大模型测评研究,旨在有效应对相关挑战,避免潜在的风险。对AIGC大模型测评研究进行了回顾,并对其进行了综述和分析。对模型测评过程进行概述,内容涵盖模型测评前准备和相应的测评指标,并系统性地整理了现有测评基准。讨论了AIGC大模型在金融、政治和医疗领域的代表性应用及其存在的问题。通过可解释性、公平性、鲁棒性、安全性和隐私性等不同角度深入研究测评方法,对AIGC大模型测评需要关注的新问题进行解构,提出大模型测评新挑战的应对策略。最后探讨了AIGC大模型测评未来面临的挑战,并展望了其发展方向。

关键词: AIGC大模型, 大模型测评, 可解释性, 公平性, 鲁棒性, 安全与隐私保护