Variational Information Bottleneck-Guided Complementary Concept Bottleneck Model

doi:10.3778/j.issn.1673-9418.2503023

Abstract

Abstract: Concept bottleneck models (CBMs) project visual features extracted from black-box models onto a set of human-interpretable concepts to facilitate decision-making. Existing approaches typically rely on large language models (LLMs) to generate textual concepts and on multimodal pretrained models to align visual features with text embeddings. However, these methods often introduce textual noise into the bottleneck, resulting in explanations that may not accurately reflect the image content or its visual attributes. To address this limitation, a variational information bottleneck-guided complementary concept bottleneck model is proposed. This method employs a chain-of-thought (CoT)-based concept generation strategy that prompts both vision-language models (VLMs) and LLMs to produce more precise and complementary textual descriptions. A concept selection module based on variational information bottleneck feature attribution method is then developed to extract the textual concepts most relevant to the image content. Furthermore, an image classification strategy is designed that integrates dual-branch concept activation scores from complementary concept bottlenecks to support robust decision-making. Finally, an interpretability efficiency metric is introduced to evaluate the succinctness and effectiveness of the generated explanations. Experimental results on six public datasets demonstrate that the proposed method not only outperforms five state-of-the-art models in interpretability efficiency, but also achieves comparable or even superior classification accuracy.

Key words: concept bottleneck models, variational information bottleneck, feature attribution, interpretability

摘要： 概念瓶颈模型（CBM）将黑盒模型提取的视觉特征表示映射到一组可解释的概念上，并利用概念进行决策。最新提出的方法主要利用大语言模型（LLM）生成文本概念，通过多模态预训练模型将视觉表示与文本概念嵌入相匹配。然而这些方法也将文本噪声注入了概念瓶颈，导致输出的文本解释与图像内容不匹配或与视觉属性无关。针对上述问题，提出了基于变分信息瓶颈引导的互补概念瓶颈模型。设计一种基于思维链（CoT）技术的概念生成方法，提示视觉语言模型（VLM）和大语言模型分别输出更加准确且互补的文本描述。构建一个基于变分信息瓶颈特征归因方法的概念筛选模块，提取文本描述中与图像内容相关性最高的文本概念。设计一种基于互补概念瓶颈的图像分类方法，结合双支路的概念激活得分进行决策。为了评估概念瓶颈模型输出解释的简洁性和有效性，提出了可解释性效率指标。在6个公开数据集上的实验表明，该算法在可解释性效率方面优于其他5个最新模型，同时达到相近甚至更优的准确率。

关键词: 概念瓶颈模型, 变分信息瓶颈, 特征归因, 可解释性

JI Zhong, LIN Zijie. Variational Information Bottleneck-Guided Complementary Concept Bottleneck Model[J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(11): 2950-2966.

冀中, 林子杰. 变分信息瓶颈引导的互补概念瓶颈模型[J]. 计算机科学与探索, 2025, 19(11): 2950-2966.

References

[1] KOH P W, NGUYEN T, TANG Y S, et al. Concept bottleneck models[C]//Proceedings of the 37th International Conference on Machine Learning, 2020: 5338-5348.
[2] LAMPERT C H, NICKISCH H, HARMELING S. Attribute-based classification for zero-shot visual object categorization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(3): 453-465.
[3] RUSSAKOVSKY O, LI F F. Attribute learning in large-scale datasets[C]//Proceedings of the 12th European Conference on Computer Vision: Trends and Topics in Computer Vision. Berlin, Heidelberg: Springer, 2012: 1-14.
[4] XU W J, XIAN Y Q, WANG J N, et al. Attribute prototype network for zero-shot learning[C]//Advances in Neural Information Processing Systems 33, 2020: 21969-21980.
[5] YUN T, BHALLA U, PAVLICK E, et al. Do vision-language pretrained models learn composable primitive concepts? [EB/OL]. [2025-03-02]. https://arxiv.org/abs/2203.17271.
[6] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]//Proceedings of the 38th International Conference on Machine Learning, 2021: 8748-8763.
[7] YUKSEKGONUL M, WANG M, ZOU J. Post-hoc concept bottleneck models[EB/OL]. [2025-03-05]. https://arxiv.org/abs/2205.15480.
[8] SPEER R, CHIN J, HAVASI C. ConceptNet 5.5: an open multilingual graph of general knowledge[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2017, 31(1): 4444-4451.
[9] KIM E, JUNG D, PARK S, et al. Probabilistic concept bottleneck models[C]//Proceedings of the 40th International Conference on Machine Learning, 2023: 16521-16540.
[10] DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]//Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2009: 248-255.
[11] MAJI S, RAHTU E, KANNALA J, et al. Fine-grained visual classification of aircraft[EB/OL]. [2025-03-06]. https://arxiv. org/abs/1306.5151.
[12] OIKARINEN T, DAS S, NGUYEN L M, et al. Label-free concept bottleneck models[C]//Proceedings of the 11th International Conference on Learning Representations, 2023.
[13] YANG Y, PANAGOPOULOU A, ZHOU S H, et al. Language in a bottle: language model guided concept bottlenecks for interpretable image classification[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 19187-19197.
[14] YAN A, WANG Y, ZHONG Y W, et al. Learning concise and descriptive attributes for visual recognition[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2024: 3067-3077.
[15] MARGELOIU A, ASHMAN M, BHATT U, et al. Do concept bottleneck models learn as intended?[EB/OL]. [2025-03-06]. https://arxiv.org/abs/2105.04289.
[16] MAHINPEI A, CLARK J, LAGE I, et al. Promises and pitfalls of black-box concept learning models[EB/OL]. [2025-03-06]. https://arxiv.org/abs/2106.13314.
[17] WEI J, WANG X Z, SCHUURMANS D, et al. Chain-of-thought prompting elicits reasoning in large language models[C]//Advances in Neural Information Processing Systems 35, 2022: 24824-24837.
[18] DAI W L, LI J N, LI D X, et al. InstructBLIP: towards general-purpose vision-language models with instruction tuning[C]//Advances in Neural Information Processing Systems, 2023.
[19] ALLARD J, KILPATRICK L, HEIDEL S, et al. GPT-3.5 turbo fine-tuning and API updates[EB/OL]. [2025-03-06]. https://openai.com/blog/gpt-3-5-turbo/.
[20] TISHBY N, PEREIRA F C, BIALEK W. The information bottleneck method[EB/OL]. [2025-03-06]. https://arxiv.org/abs/physics/0004057.
[21] COOK R D. Detection of influential observation in linear regression[J]. Technometrics, 1977, 19(1): 15-18.
[22] RIBEIRO M T, SINGH S, GUESTRIN C. Anchors: high-precision model-agnostic explanations[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 1527-1535.
[23] RIBEIRO M T, SINGH S, GUESTRIN C. “Why should I trust you?”: Explaining the predictions of any classifier[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2016: 1135-1144.
[24] LUNDBERG S M, LEE S I. A unified approach to interpreting model predictions[C]//Advances in Neural Information Processing Systems 30, 2017: 4768-4777.
[25] SHAPLEY L S. A value for n-person games[J]. Annals of Mathematics Studies, 1953, 28: 307-318.
[26] SHRIKUMAR A, GREENSIDE P, KUNDAJE A. Learning important features through propagating activation differences[C]//Proceedings of the 34th International Conference on Machine Learning, 2017: 3145-3153.
[27] SIMONYAN K, VEDALDI A, ZISSERMAN A. Deep inside convolutional networks: visualising image classification models and saliency maps[EB/OL]. [2025-03-06]. https://arxiv.org/abs/1312.6034.
[28] SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[J]. International Journal of Computer Vision, 2020, 128(2): 336-359.
[29] BACH S, BINDER A, MONTAVON G, et al. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation[J]. PLoS One, 2015, 10(7): e0130140.
[30] CHEN C F, LI O, TAO C F, et al. This looks like that: deep learning for interpretable image recognition[C]//Advances in Neural Information Processing Systems 32, 2019: 8928-8939.
[31] LI O, LIU H, CHEN C F, et al. Deep learning for case-based reasoning through prototypes: a neural network that explains its predictions[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 3530-3537.
[32] CHEN Z, BEI Y J, RUDIN C. Concept whitening for interpretable image recognition[J]. Nature Machine Intelligence, 2020, 2(12): 772-782.
[33] FONG R, VEDALDI A. Net2Vec: quantifying and explaining how concepts are encoded by filters in deep neural networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8730-8738.
[34] GHORBANI A, WEXLER J, ZOU J, et al. Towards automatic concept-based explanations[C]//Advances in Neural Information Processing Systems 32, 2019: 9273-9282.
[35] BANG Y J, CAHYAWIJAYA S, LEE N, et al. A multitask, multilingual, multimodal evaluation of ChatGPT on reasoning, hallucination, and interactivity[EB/OL]. [2025-03-09]. https://arxiv.org/abs/2302.04023.
[36] GUERREIRO N M, ALVES D M, WALDENDORF J, et al. Hallucinations in large multilingual translation models[J]. Transactions of the Association for Computational Linguistics, 2023, 11: 1500-1517.
[37] LI Y F, DU Y F, ZHOU K, et al. Evaluating object hallucination in large vision-language models[EB/OL]. [2025-04-20]. https://arxiv.org/abs/2305.10355.
[38] SUN Z Q, SHEN S, CAO S C, et al. Aligning large multimodal models with factually augmented RLHF[EB/OL]. [2025-04-20]. https://arxiv.org/abs/2309.14525.
[39] 张虎成, 李雷孝, 刘东江. 多模态数据融合研究综述[J]. 计算机科学与探索, 2024, 18(10): 2501-2520.
ZHANG H C, LI L X, LIU D J. Survey of multimodal data fusion research[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(10): 2501-2520.
[40] KRIZHEVSKY A, HINTON G. Learning multiple layers of features from tiny images[EB/OL]. [2025-04-20] https://www. cs.toronto.edu/~kriz/learning-features-2009-TR.pdf.
[41] WAH C, BRANSON S, WELINDER P, et al. The Caltech-UCSD birds-200-2011 dataset: CNS-TR-2010-001[R]. California Institute of Technology, 2011.
[42] BOSSARD L, GUILLAUMIN M, VAN GOOL L. Food-101-mining discriminative components with random forests [C]//Proceedings of the 13th European Conference on Computer Vision. Cham: Springer, 2014: 446-461.
[43] NILSBACK M E, ZISSERMAN A. Automated flower classification over a large number of classes[C]//Proceedings of the 2008 6th Indian Conference on Computer Vision, Graphics & Image Processing. Piscataway: IEEE, 2009: 722-729.
[44] DESAI S, RAMASWAMY H G. Ablation-CAM: visual explanations for deep convolutional network via gradient-free localization[C]//Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2020: 972-980.
[45] JIANG P T, ZHANG C B, HOU Q B, et al. LayerCAM: exploring hierarchical class activation maps for localization[J]. IEEE Transactions on Image Processing, 2021, 30: 5875-5888.
[46] MUHAMMAD M B, YEASIN M. Eigen-CAM: class activation map using principal components[C]//Proceedings of the 2020 International Joint Conference on Neural Networks. Piscataway: IEEE, 2020: 1-7.
[47] BACH F. Convex analysis and optimization with submodular functions: a tutorial[EB/OL]. [2025-03-06]. https://arxiv. org/abs/1010.4207.
[48] ZHONG Y W, YANG J W, ZHANG P C, et al. RegionCLIP: region-based language-image pretraining[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 16772-16782.
[49] LI L H, ZHANG P C, ZHANG H T, et al. Grounded language-image pre-training[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 10955-10965.
[50] LIU S L, ZENG Z Y, REN T H, et al. Grounding DINO: marrying DINO with grounded pre-training for open-set object detection[C]//Proceedings of the 18th European Conference on Computer Vision. Cham: Springer, 2024: 38-55.
[51] REN T H, CHEN Y H, JIANG Q, et al. DINO-X: a unified vision model for open-world object detection and understanding[EB/OL]. [2025-03-06]. https://arxiv.org/abs/2411. 14347.
[52] REN T H, LIU S L, ZENG A L, et al. Grounded SAM: assembling open-world models for diverse visual tasks[EB/OL]. [2025-03-06]. https://arxiv.org/abs/2401.14159.
[53] WANG H X, VASU P K A, FAGHRI F, et al. SAM-CLIP: merging vision foundation models towards semantic and spatial understanding[C]//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2024: 3635-3647.
[54] LI F, ZHANG H, SUN P Z, et al. Segment and recognize anything at any granularity[C]//Proceedings of the 18th European Conference on Computer Vision. Cham: Springer, 2024: 467-484.