计算机科学与探索

• 学术研究 •    下一篇

注意力引导多模态特征融合的虚假新闻检测方法

邓兴宇,王龙业,曾晓莉,叶浩,车熹昊   

  1. 1. 西南石油大学 电气信息学院,成都 610500
    2. 西藏大学 信息科学技术学院,拉萨 850000

Fake news detection method based on attention-guided multimodal feature fusion

DENG Xingyu,  WANG Longye,  ZENG Xiaoli,  YE Hao,  CHE Xihao   

  1. 1. School of Electronics and Information Engineering, Southwest Petroleum University, Chengdu 610500, China
    2. School of Information Science and Technology, Tibet University, Lhasa 850000, China

摘要: 现有多模态虚假新闻检测方法在图像多层次频域信息利用和模态间信息深度交融方面存在局限,难以充分挖掘图像的潜在特征及多模态特征之间的相关性,进而影响检测性能。虚假新闻图像在传播过程中通常经历多次压缩或篡改操作,从而引发频域异常响应,传统方法多依赖傅里叶变换提取频域特征,但其全局频域分析会丢失局部篡改痕迹,且无法实现多尺度特征解耦。为了深入发掘并充分利用这些关键特征及其内在联系,提升虚假新闻检测效能,一种注意力引导多模态特征融合的虚假新闻检测方法(AGMFN)被提出。该方法使用基于小波变换的双路径特征提取模块对图像的多层次频域信息进行建模,通过二级小波分解捕获低频全局结构与高频局部异常,并结合特征增强卷积强化细节特征。同时,预训练模型与频域特征提取模块分别提取文本、图像和频域特征,构建物理取证与语义线索的联合鉴别框架。为实现多模态特征融合并捕捉不同模态之间的深度关联特性,设计了一种基于注意力机制的长序列特征融合模块,引入指数递减加权系数建模不同模态之间的长期依赖关系,解决传统拼接融合的时序失配问题。通过跨模态注意力实现了文本-频域-视觉的层次化融合,在保持计算效率的同时增强虚假新闻判别能力。实验结果表明,AGMFN在Weibo数据集和Twitter数据集上的分类准确率分别达到了0.917和0.847,优于现有基线模型。可视化实验进一步验证了融合后的多模态特征具有更强的泛化能力,提高了虚假新闻的识别效果。

关键词: 虚假新闻检测, 小波变换, 注意力机制, 多模态特征融合

Abstract: Existing multimodal fake news detection methods have limitations in utilizing image’s multilayered frequency-domain information and enabling deep interactions between modalities, making it difficult to fully exploit the latent features of images and the correlations between multimodal features, which in turn affects detection performance. Fake news images typically undergo multiple compression or tampering operations during their dissemination, which leads to abnormal frequency-domain responses, traditional methods often rely on Fourier transforms to extract frequency-domain features; however, their global frequency analysis can overlook local tampering traces and fail to achieve multi-scale feature decoupling. To better uncover and fully utilize these critical features and their inherent relationships, thereby improving fake news detection performance, an Attention-Guided Multimodal Feature Fusion Method for Fake News Detection (AGMFN) is proposed. This method models the multi-layered frequency-domain information of images using a wavelet-transform-based dual-path feature extraction module. It captures low-frequency global structures and high-frequency local anomalies through two-level wavelet decomposition, while enhancing detail features through feature-boosting convolution. Meanwhile, pre-trained models and frequency-domain feature extraction modules are employed to separately extract textual, visual, and frequency-domain features, constructing a joint framework for physical forensics and semantic clues. To enable multimodal feature fusion and capture deep correlations between different modalities, an attention mechanism-based long-sequence feature fusion module is designed, introducing an exponentially decaying weighting coefficient to model long-term dependencies between modalities and solve the temporal mismatch issue in traditional concatenation-based fusion. Through cross-modal attention, a hierarchical fusion of text, frequency-domain, and visual features is achieved, enhancing fake news detection capability while maintaining computational efficiency. Experimental results show that AGMFN achieves classification accuracies of 0.917 and 0.847 on the Weibo and Twitter datasets, respectively, outperforming existing baseline models. Visualization experiments further confirm that the fused multimodal features exhibit stronger generalization ability, thus improving the performance of fake news detection.

Key words: fake news detection, wavelet transform, attention mechanism, multimodal feature fusion