计算机科学与探索 ›› 2025, Vol. 19 ›› Issue (11): 2950-2966.DOI: 10.3778/j.issn.1673-9418.2503023

• 理论·算法 • 上一篇    下一篇

变分信息瓶颈引导的互补概念瓶颈模型

冀中,林子杰   

  1. 天津大学 电气自动化与信息工程学院,天津 300072
  • 出版日期:2025-11-01 发布日期:2025-10-30

Variational Information Bottleneck-Guided Complementary Concept Bottleneck Model

JI Zhong, LIN Zijie   

  1. School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
  • Online:2025-11-01 Published:2025-10-30

摘要: 概念瓶颈模型(CBM)将黑盒模型提取的视觉特征表示映射到一组可解释的概念上,并利用概念进行决策。最新提出的方法主要利用大语言模型(LLM)生成文本概念,通过多模态预训练模型将视觉表示与文本概念嵌入相匹配。然而这些方法也将文本噪声注入了概念瓶颈,导致输出的文本解释与图像内容不匹配或与视觉属性无关。针对上述问题,提出了基于变分信息瓶颈引导的互补概念瓶颈模型。设计一种基于思维链(CoT)技术的概念生成方法,提示视觉语言模型(VLM)和大语言模型分别输出更加准确且互补的文本描述。构建一个基于变分信息瓶颈特征归因方法的概念筛选模块,提取文本描述中与图像内容相关性最高的文本概念。设计一种基于互补概念瓶颈的图像分类方法,结合双支路的概念激活得分进行决策。为了评估概念瓶颈模型输出解释的简洁性和有效性,提出了可解释性效率指标。在6个公开数据集上的实验表明,该算法在可解释性效率方面优于其他5个最新模型,同时达到相近甚至更优的准确率。

关键词: 概念瓶颈模型, 变分信息瓶颈, 特征归因, 可解释性

Abstract: Concept bottleneck models (CBMs) project visual features extracted from black-box models onto a set of human-interpretable concepts to facilitate decision-making. Existing approaches typically rely on large language models (LLMs) to generate textual concepts and on multimodal pretrained models to align visual features with text embeddings. However, these methods often introduce textual noise into the bottleneck, resulting in explanations that may not accurately reflect the image content or its visual attributes. To address this limitation, a variational information bottleneck-guided complementary concept bottleneck model is proposed. This method employs a chain-of-thought (CoT)-based concept generation strategy that prompts both vision-language models (VLMs) and LLMs to produce more precise and complementary textual descriptions. A concept selection module based on variational information bottleneck feature attribution method is then developed to extract the textual concepts most relevant to the image content. Furthermore, an image classification strategy is designed that integrates dual-branch concept activation scores from complementary concept bottlenecks to support robust decision-making. Finally, an interpretability efficiency metric is introduced to evaluate the succinctness and effectiveness of the generated explanations. Experimental results on six public datasets demonstrate that the proposed method not only outperforms five state-of-the-art models in interpretability efficiency, but also achieves comparable or even superior classification accuracy.

Key words: concept bottleneck models, variational information bottleneck, feature attribution, interpretability