Study of Coarse-to-Fine Class Activation Mapping Algorithms Based on Contrastive Layer-wise Relevance Propagation

SUN Hui; SHI Yulong; WANG Rui

doi:10.11999/JEIT220113

Volume 45 Issue 4

Apr. 2023

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2023 > 45(4): 1454-1463

SUN Hui, SHI Yulong, WANG Rui. Study of Coarse-to-Fine Class Activation Mapping Algorithms Based on Contrastive Layer-wise Relevance Propagation[J]. Journal of Electronics & Information Technology, 2023, 45(4): 1454-1463. doi: 10.11999/JEIT220113

Citation:

SUN Hui, SHI Yulong, WANG Rui. Study of Coarse-to-Fine Class Activation Mapping Algorithms Based on Contrastive Layer-wise Relevance Propagation[J]. Journal of Electronics & Information Technology, 2023, 45(4): 1454-1463. doi: 10.11999/JEIT220113

Citation:

PDF( 7596 KB)

Study of Coarse-to-Fine Class Activation Mapping Algorithms Based on Contrastive Layer-wise Relevance Propagation

doi: 10.11999/JEIT220113 cstr: 32379.14.JEIT220113

College of Electronic Information and Automation, Civil Aviation University of China, Tianjin 300300, China

Funds: The Natural Science Foundation of Tianjin (18JCYBJC42300)

Received Date: 2022-01-27
Accepted Date: 2022-06-17
Rev Recd Date: 2022-06-10

Available Online: 2022-06-20

Publish Date: 2023-04-10

Abstract

Abstract

Deep learning algorithms represented by Convolutional Neural Networks (CNN) are highly dependent on the nonlinearity of the model and debugging techniques, which have generally black-box properties during practical applications, limiting severely their further development in security-sensitive fields. To this end, a Coarse-to-Fine Class Activation Mapping (CF-CAM) algorithm is proposed for diagnosing the decision-making behaviors of deep neural networks. The algorithm re-establishes the relationship between the feature map and the model decision, uses the contrastive layer-wise relevance propagation theory to obtain the contribution of each position in the feature map to the network decision, generates a spatial-level correlation mask and finds the important area that affects the model decision. After that, the mask is linearly weighted with the fuzzed input image and re-input into the network to obtain the target score of the feature map, and the deep neural network is explained from the coarse stage to the fine stage in the spatial domain and the channel domain. The experimental results show that the CF-CAM proposed in this paper has obvious advantages in terms of faithfulness and localization performance compared to other methods. In addition, this paper applies CF-CAM as a data enhancement strategy for the task of fine-grained classification of birds, which can effectively improve the accuracy of network recognition by learning difficult samples, further verify the effectiveness and superiority of this method.

FullText(HTML)

References(31)

References

[1]	时增林, 叶阳东, 吴云鹏, 等. 基于序的空间金字塔池化网络的人群计数方法[J]. 自动化学报, 2016, 42(6): 866–874. doi: 10.16383/j.aas.2016.c150663 SHI Zenglin, YE Yangdong, WU Yunpeng, et al. Crowd counting using rank-based spatial pyramid pooling network[J]. Acta Automatica Sinica, 2016, 42(6): 866–874. doi: 10.16383/j.aas.2016.c150663
[2]	付晓薇, 杨雪飞, 陈芳, 等. 一种基于深度学习的自适应医学超声图像去斑方法[J]. 电子与信息学报, 2020, 42(7): 1782–1789. doi: 10.11999/JEIT190580 FU Xiaowei, YANG Xuefei, CHEN Fang, et al. An adaptive medical ultrasound images despeckling method based on deep learning[J]. Journal of Electronics &Information Technology, 2020, 42(7): 1782–1789. doi: 10.11999/JEIT190580
[3]	PU Fangling, DING Chujiang, CHAO Zeyi, et al. Water-quality classification of inland lakes using Landsat8 images by convolutional neural networks[J]. Remote Sensing, 2019, 11(14): 1674. doi: 10.3390/rs11141674
[4]	SAMBASIVAM G and OPIYO G D. A predictive machine learning application in agriculture: Cassava disease detection and classification with imbalanced dataset using convolutional neural networks[J]. Egyptian Informatics Journal, 2021, 22(1): 27–34. doi: 10.1016/j.eij.2020.02.007
[5]	ZEILER M D and FERGUS R. Visualizing and understanding convolutional networks[C]. The 13th European Conference on Computer Vision, Zurich, Switzerland, 2014: 818–833.
[6]	ZHOU Bolei, KHOSLA A, LAPEDRIZA A, et al. Object detectors emerge in deep scene CNNs[C]. The 3rd International Conference on Learning Representations, San Diego, USA, 2015.
[7]	PETSIUK V, DAS A, and SAENKO K. RISE: Randomized input sampling for explanation of black-box models[C]. British Machine Vision Conference 2018, Newcastle, UK, 2018.
[8]	FONG R C and VEDALDI A. Interpretable explanations of black boxes by meaningful perturbation[C]. The IEEE International Conference on Computer Vision, Venice, Italy, 2017: 3449–3457.
[9]	AGARWAL C, SCHONFELD D, and NGUYEN A. Removing input features via a generative model to explain their attributions to an image classifier's decisions[EB/OL]. https://arxiv.org/abs/1910.042562019, 2019.
[10]	CHANG Chunhao, CREAGER E, GOLDENBERG A, et al. Explaining image classifiers by counterfactual generation[C]. The 7th International Conference on Learning Representations, New Orleans, USA, 2019.
[11]	SIMONYAN K, VEDALDI A, and ZISSERMAN A. Deep inside convolutional networks: Visualising image classification models and saliency maps[C]. The 2nd International Conference on Learning Representations, Banff, Canada, 2014.
[12]	SPRINGENBERG J T, DOSOVITSKIY A, BROX T, et al. Striving for simplicity: The all convolutional net[C]. The 3rd International Conference on Learning Representations, San Diego, USA, 2015.
[13]	BACH S, BINDER A, MONTAVON G, et al. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation[J]. PloS One, 2015, 10(7): e0130140. doi: 10.1371/journal.pone.0130140
[14]	ZHOU Bolei, KHOSLA A, LAPEDRIZA A, et al. Learning deep features for discriminative localization[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 2921–2929.
[15]	SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization[C]. The IEEE International Conference on Computer Vision, Venice, Italy, 2017: 618–626.
[16]	CHATTOPADHAY A, SARKAR A, HOWLADER P, et al. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks[C]. 2018 IEEE Winter Conference on Applications of Computer Vision, Lake Tahoe, USA, 2018: 839–847.
[17]	OMEIZA D, SPEAKMAN S, CINTAS C, et al. Smooth grad-cam++: An enhanced inference level visualization technique for deep convolutional neural network models[EB/OL]. https://arxiv.org/abs/1908.01224, 2019.
[18]	WANG Haofan, WANG Zifan, DU Mengnan, et al. Score-CAM: Score-weighted visual explanations for convolutional neural networks[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, USA, 2020: 111–119.
[19]	GU Jindong, YANG Yinchong, and TRESP V. Understanding individual decisions of CNNs via contrastive backpropagation[C]. The 14th Asian Conference on Computer Vision, Perth, Australia, 2018: 119–134.
[20]	KRIZHEVSKY A, SUTSKEVER I, and HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84–90. doi: 10.1145/3065386
[21]	SIMONYAN K and ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]. The 3rd International Conference on Learning Representations, San Diego, USA, 2015.
[22]	SZEGEDY C, LIU Wei, JIA Yangqing, et al. Going deeper with convolutions[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 1–9.
[23]	LEE J R, KIM S, PARK I, et al. Relevance-CAM: Your model already knows where to look[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 14939–14948.
[24]	SATTARZADEH S, SUDHAKAR M, PLATANIOTIS K N, et al. Integrated Grad-CAM: Sensitivity-aware visual explanation of deep convolutional networks via integrated gradient-based scoring[C]. ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto, Canada, 2021: 1775–1779.
[25]	ZHANG Qinglong, RAO Lu, and YANG Yubin. Group-CAM: Group score-weighted visual explanations for deep convolutional networks[EB/OL]. https://arxiv.org/abs/2103.13859, 2021.
[26]	WAH C, BRANSON S, WELINDER P, et al. The Caltech-UCSD birds-200-2011 dataset[R]. CNS-TR-2011-001, 2011.
[27]	RUSSAKOVSKY O, DENG Jia, SU Hao, et al. Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211–252. doi: 10.1007/s11263-015-0816-y
[28]	SMILKOV D, THORAT N, KIM B, et al. SmoothGrad: Removing noise by adding noise[EB/OL]. https://arxiv.org/abs/1706.03825, 2017.
[29]	SUNDARARAJAN M, TALY A, and YAN Qiqi. Axiomatic attribution for deep networks[C]. The 34th International Conference on Machine Learning, Sydney, Australia, 2017: 3319–3328.
[30]	HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778.
[31]	WU Pingyu, ZHAI Wei, and CAO Yang. Background activation suppression for weakly supervised object localization[EB/OL]. https://arxiv.org/abs/2112.00580, 2022.