基于对比层级相关性传播的由粗到细的类激活映射算法研究

孙辉; 史玉龙; 王蕊

doi:10.11999/JEIT220113

基于对比层级相关性传播的由粗到细的类激活映射算法研究

doi: 10.11999/JEIT220113 cstr: 32379.14.JEIT220113

中国民航大学电子信息与自动化学院天津 300300

基金项目: 天津市自然科学基金(18JCYBJC42300)

详细信息

作者简介:
孙辉：男，讲师，主要研究方向为无线传感器网络、智慧机场、机场驱鸟、认知无线电、多智能体

史玉龙：男，硕士生，研究方向为图像处理、机场驱鸟、系统辩识、无线传感器网络

王蕊：女，教授，主要研究方向为机场驱鸟、分布式系统、无线传感网络、混沌系统、多智能体、系统辨识

通讯作者:
王蕊　ruiwang@cauc.edu.cn

中图分类号: TP183
计量
- 文章访问数: 731
- HTML全文浏览量: 750
- PDF下载量: 103
- 被引次数: 0
出版历程
- 收稿日期: 2022-01-27
- 修回日期: 2022-06-10
- 录用日期: 2022-06-17
- 网络出版日期: 2022-06-20
- 刊出日期: 2023-04-10

Study of Coarse-to-Fine Class Activation Mapping Algorithms Based on Contrastive Layer-wise Relevance Propagation

College of Electronic Information and Automation, Civil Aviation University of China, Tianjin 300300, China

Funds: The Natural Science Foundation of Tianjin (18JCYBJC42300)

摘要

摘要: 以卷积神经网络为代表的深度学习算法高度依赖于模型的非线性和调试技术，在实际应用过程中普遍存在黑箱属性，严重限制了其在安全敏感领域的进一步发展。为此，该文提出一种由粗到细的类激活映射算法(CF-CAM)，用于对深度神经网络的决策行为进行诊断。该算法重新建立了特征图和模型决策之间的关系，利用对比层级相关性传播理论获取特征图中每个位置对网络决策的贡献生成空间级的相关性掩码，找到影响模型决策的重要性区域，再与经过模糊化操作的输入图像进行线性加权重新输入到网络中得到特征图的目标分数，从空间域和通道域实现对深度神经网络进行由粗到细的解释。实验结果表明，相较于其他方法该文提出的CF-CAM在忠实度和定位性能上具有显著提升。此外，该文将CF-CAM作为一种数据增强策略应用于鸟类细粒度分类任务，对困难样本进行学习，可以有效提高网络识别的准确率，进一步验证了CF-CAM算法的有效性和优越性。
- 卷积神经网络 /
- 类激活映射 /
- 对比层级相关性传播 /
- 鸟类细粒度分类 /
- 数据增强
Abstract: Deep learning algorithms represented by Convolutional Neural Networks (CNN) are highly dependent on the nonlinearity of the model and debugging techniques, which have generally black-box properties during practical applications, limiting severely their further development in security-sensitive fields. To this end, a Coarse-to-Fine Class Activation Mapping (CF-CAM) algorithm is proposed for diagnosing the decision-making behaviors of deep neural networks. The algorithm re-establishes the relationship between the feature map and the model decision, uses the contrastive layer-wise relevance propagation theory to obtain the contribution of each position in the feature map to the network decision, generates a spatial-level correlation mask and finds the important area that affects the model decision. After that, the mask is linearly weighted with the fuzzed input image and re-input into the network to obtain the target score of the feature map, and the deep neural network is explained from the coarse stage to the fine stage in the spatial domain and the channel domain. The experimental results show that the CF-CAM proposed in this paper has obvious advantages in terms of faithfulness and localization performance compared to other methods. In addition, this paper applies CF-CAM as a data enhancement strategy for the task of fine-grained classification of birds, which can effectively improve the accuracy of network recognition by learning difficult samples, further verify the effectiveness and superiority of this method.
- Convolutional Neural Network (CNN) /
- Class Activation Mapping (CAM) /
- Contrastive layer-wise relevance propagation /
- Birds image classification /
- Data enhancement

HTML全文

图 1 CF-CAM算法计算流程图

下载: 全尺寸图片幻灯片

图 2 CF-CAM的结果对比

下载: 全尺寸图片幻灯片

图 3 CF-CAM的类别可分性结果

下载: 全尺寸图片幻灯片

图 4 CF-CAM的多目标可视化结果

下载: 全尺寸图片幻灯片

图 5 CF-CAM模型参数敏感性检查结果

下载: 全尺寸图片幻灯片

图 6 CF-CAM模型诊断结果

下载: 全尺寸图片幻灯片

图 7 基于显著图数据增强过程

下载: 全尺寸图片幻灯片

算法1　CF-CAM算法
输入: Image I, Baseline Image I_b, Class c, Model f(x), target 　　　 layer l, Gaussian blur parameters: ksize, sigma.
(1) Initialization: Initial Lc CF-CAM←0, α^c←[ ], Baseline 　　 Input ${I_b} = {\text{Guassian\_blur2d} }(I,{\text{ksize,sigma} })$;
(2) Get feature maps of target layer A_k, C is the number of 　　 channels in A_k, Relevance weights R^c;
(3) for k in [0, 1, ···, C–1] do
$ M_k^c = {\text{upsample}}(R_k^c{A_k}) $;
$ I' = I \odot M_k^c + {I_b} \odot (1 - M_k^c) $;
$ \alpha _k^c = {f^c}(I') - {f^c}({I_b}) $;
$ L_{{\text{CF - CAM}}}^c = L_{{\text{CF - CAM}}}^c + \alpha _k^cM_k^c $;
end
(4) Return Lc CF-CAM
输出：Saliency map Lc CF-CAM

下载: 导出CSV

表 1 CF-CAM忠实度评估结果(%)

	RISE	Grad-CAM	Grad-CAM++	Score-CAM	Relevance-CAM	CF-CAM
A.D.	57.4	46.3	43.9	41.4	45.2	39.8
A.I.	8.7	15.2	18.6	20.5	17.5	21.3

下载: 导出CSV

表 2 CF-CAM的定位性能评估结果

方法	RISE	Grad-CAM	Grad-CAM++	Score-CAM	Relevance-CAM	CF-CAM
比例	40.5	52.3	54.6	61.8	53.9	62.7

下载: 导出CSV

参考文献(31)

[1]	时增林, 叶阳东, 吴云鹏, 等. 基于序的空间金字塔池化网络的人群计数方法[J]. 自动化学报, 2016, 42(6): 866–874. doi: 10.16383/j.aas.2016.c150663 SHI Zenglin, YE Yangdong, WU Yunpeng, et al. Crowd counting using rank-based spatial pyramid pooling network[J]. Acta Automatica Sinica, 2016, 42(6): 866–874. doi: 10.16383/j.aas.2016.c150663
[2]	付晓薇, 杨雪飞, 陈芳, 等. 一种基于深度学习的自适应医学超声图像去斑方法[J]. 电子与信息学报, 2020, 42(7): 1782–1789. doi: 10.11999/JEIT190580 FU Xiaowei, YANG Xuefei, CHEN Fang, et al. An adaptive medical ultrasound images despeckling method based on deep learning[J]. Journal of Electronics &Information Technology, 2020, 42(7): 1782–1789. doi: 10.11999/JEIT190580
[3]	PU Fangling, DING Chujiang, CHAO Zeyi, et al. Water-quality classification of inland lakes using Landsat8 images by convolutional neural networks[J]. Remote Sensing, 2019, 11(14): 1674. doi: 10.3390/rs11141674
[4]	SAMBASIVAM G and OPIYO G D. A predictive machine learning application in agriculture: Cassava disease detection and classification with imbalanced dataset using convolutional neural networks[J]. Egyptian Informatics Journal, 2021, 22(1): 27–34. doi: 10.1016/j.eij.2020.02.007
[5]	ZEILER M D and FERGUS R. Visualizing and understanding convolutional networks[C]. The 13th European Conference on Computer Vision, Zurich, Switzerland, 2014: 818–833.
[6]	ZHOU Bolei, KHOSLA A, LAPEDRIZA A, et al. Object detectors emerge in deep scene CNNs[C]. The 3rd International Conference on Learning Representations, San Diego, USA, 2015.
[7]	PETSIUK V, DAS A, and SAENKO K. RISE: Randomized input sampling for explanation of black-box models[C]. British Machine Vision Conference 2018, Newcastle, UK, 2018.
[8]	FONG R C and VEDALDI A. Interpretable explanations of black boxes by meaningful perturbation[C]. The IEEE International Conference on Computer Vision, Venice, Italy, 2017: 3449–3457.
[9]	AGARWAL C, SCHONFELD D, and NGUYEN A. Removing input features via a generative model to explain their attributions to an image classifier's decisions[EB/OL]. https://arxiv.org/abs/1910.042562019, 2019.
[10]	CHANG Chunhao, CREAGER E, GOLDENBERG A, et al. Explaining image classifiers by counterfactual generation[C]. The 7th International Conference on Learning Representations, New Orleans, USA, 2019.
[11]	SIMONYAN K, VEDALDI A, and ZISSERMAN A. Deep inside convolutional networks: Visualising image classification models and saliency maps[C]. The 2nd International Conference on Learning Representations, Banff, Canada, 2014.
[12]	SPRINGENBERG J T, DOSOVITSKIY A, BROX T, et al. Striving for simplicity: The all convolutional net[C]. The 3rd International Conference on Learning Representations, San Diego, USA, 2015.
[13]	BACH S, BINDER A, MONTAVON G, et al. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation[J]. PloS One, 2015, 10(7): e0130140. doi: 10.1371/journal.pone.0130140
[14]	ZHOU Bolei, KHOSLA A, LAPEDRIZA A, et al. Learning deep features for discriminative localization[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 2921–2929.
[15]	SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization[C]. The IEEE International Conference on Computer Vision, Venice, Italy, 2017: 618–626.
[16]	CHATTOPADHAY A, SARKAR A, HOWLADER P, et al. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks[C]. 2018 IEEE Winter Conference on Applications of Computer Vision, Lake Tahoe, USA, 2018: 839–847.
[17]	OMEIZA D, SPEAKMAN S, CINTAS C, et al. Smooth grad-cam++: An enhanced inference level visualization technique for deep convolutional neural network models[EB/OL]. https://arxiv.org/abs/1908.01224, 2019.
[18]	WANG Haofan, WANG Zifan, DU Mengnan, et al. Score-CAM: Score-weighted visual explanations for convolutional neural networks[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, USA, 2020: 111–119.
[19]	GU Jindong, YANG Yinchong, and TRESP V. Understanding individual decisions of CNNs via contrastive backpropagation[C]. The 14th Asian Conference on Computer Vision, Perth, Australia, 2018: 119–134.
[20]	KRIZHEVSKY A, SUTSKEVER I, and HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84–90. doi: 10.1145/3065386
[21]	SIMONYAN K and ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]. The 3rd International Conference on Learning Representations, San Diego, USA, 2015.
[22]	SZEGEDY C, LIU Wei, JIA Yangqing, et al. Going deeper with convolutions[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 1–9.
[23]	LEE J R, KIM S, PARK I, et al. Relevance-CAM: Your model already knows where to look[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 14939–14948.
[24]	SATTARZADEH S, SUDHAKAR M, PLATANIOTIS K N, et al. Integrated Grad-CAM: Sensitivity-aware visual explanation of deep convolutional networks via integrated gradient-based scoring[C]. ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto, Canada, 2021: 1775–1779.
[25]	ZHANG Qinglong, RAO Lu, and YANG Yubin. Group-CAM: Group score-weighted visual explanations for deep convolutional networks[EB/OL]. https://arxiv.org/abs/2103.13859, 2021.
[26]	WAH C, BRANSON S, WELINDER P, et al. The Caltech-UCSD birds-200-2011 dataset[R]. CNS-TR-2011-001, 2011.
[27]	RUSSAKOVSKY O, DENG Jia, SU Hao, et al. Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211–252. doi: 10.1007/s11263-015-0816-y
[28]	SMILKOV D, THORAT N, KIM B, et al. SmoothGrad: Removing noise by adding noise[EB/OL]. https://arxiv.org/abs/1706.03825, 2017.
[29]	SUNDARARAJAN M, TALY A, and YAN Qiqi. Axiomatic attribution for deep networks[C]. The 34th International Conference on Machine Learning, Sydney, Australia, 2017: 3319–3328.
[30]	HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778.
[31]	WU Pingyu, ZHAI Wei, and CAO Yang. Background activation suppression for weakly supervised object localization[EB/OL]. https://arxiv.org/abs/2112.00580, 2022.