Adaptive Attention Mechanism Fusion for Real-Time Semantic Segmentation in Complex Scenes

CHEN Dan; LIU Le; WANG Chenhao; BAI Xiru; WANG Zichen

doi:10.11999/JEIT231338

Volume 46 Issue 8

Aug. 2024

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2024 > 46(8): 3334-3342

CHEN Dan, LIU Le, WANG Chenhao, BAI Xiru, WANG Zichen. Adaptive Attention Mechanism Fusion for Real-Time Semantic Segmentation in Complex Scenes[J]. Journal of Electronics & Information Technology, 2024, 46(8): 3334-3342. doi: 10.11999/JEIT231338

Citation:

CHEN Dan, LIU Le, WANG Chenhao, BAI Xiru, WANG Zichen. Adaptive Attention Mechanism Fusion for Real-Time Semantic Segmentation in Complex Scenes[J]. Journal of Electronics & Information Technology, 2024, 46(8): 3334-3342. doi: 10.11999/JEIT231338

Citation:

PDF( 2516 KB)

Adaptive Attention Mechanism Fusion for Real-Time Semantic Segmentation in Complex Scenes

doi: 10.11999/JEIT231338 cstr: 32379.14.JEIT231338

1.
School of Automation and Information Engineering, Xi’an University of Technology, Xi’an 710048, China
2.
School of Electronic and Information Engineering, Shaanxi Vocational and Technical College, Xi’an 710038, China

Funds: Yulin Science and Technology Bureau Program (2019-146), Xi’an Qinchuangyuan Key Industrial Chain Technology Program(23ZDCYJSGG0021-2023)

Received Date: 2023-12-04
Rev Recd Date: 2024-02-23

Available Online: 2024-03-04

Publish Date: 2024-08-30

Abstract

Abstract

Realizing high accuracy and low computational burden is a serious challenge faced by Convolutional Neural Network (CNN) for real-time semantic segmentation. In this paper, an efficient real-time semantic segmentation Adaptive Attention mechanism Fusion Network(AAFNet) is designed for complex urban street scenes with numerous types of targets and large changes in lighting. Image spatial details and semantic information are respectively extracted by the network, and then, through Feature Fusion Network(FFN), accurate semantic images are obtained. Dilated Deep-Wise separable convolution (DDW) is adopted by AAFNet to increase the receptive field of semantic feature extraction, an Adaptive Attention mechanism Fusion Module (AAFM) is proposed, which combines Adaptive average pooling(Avp) and Adaptive max pooling(Amp) to refine the edge segmentation effect of the target and reduce the leakage rate of small targets. Finally, semantic segmentation experiments are performed on the Cityscapes and CamVid datasets for complex urban street scenes. The designed AAFNet achieves 73.0% and 69.8% mean Intersection over Union (mIoU) at inference speeds of 32 fps (Cityscapes) and 52 fps (CamVid). Compared with Dilated Spatial Attention Network (DSANet), Multi-Scale Context Fusion Network (MSCFNet), and Lightweight Bilateral Asymmetric Residual Network (LBARNet), AAFNet has the highest segmentation accuracy.

FullText(HTML)

References(26)

References

[1]	HAO Shijie, ZHOU Yuan, and GUO Yanrong. A brief survey on semantic segmentation with deep learning[J]. Neurocomputing, 2020, 406: 302–321. doi: 10.1016/j.neucom.2019.11.118.
[2]	CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: Semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834–848. doi: 10.1109/TPAMI.2016.2644615.
[3]	BADRINARAYANAN V, KENDALL A, and CIPOLLA R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481–2495. doi: 10.17863/CAM.17966.
[4]	CHEN Wenlin, WILSON J T, TYREE S, et al. Compressing neural networks with the hashing trick[C]. The 32nd International Conference on Machine Learning, Lille, France, 2015: 2285–2294. doi: 10.5555/3045118.3045361.
[5]	HAN Song, MAO Huizi, and DALLY W J. Deep compression: Compressing deep neural network with pruning, trained quantization and Huffman coding[C]. 4th International Conference on Learning Representations, San Juan, Puerto Rico, 2016: 3–7.
[6]	WU Jiaxiang, LENG Cong, WANG Yuhang, et al. Quantized convolutional neural networks for mobile devices[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016: 4820–4828. doi: 10.1109/CVPR.2016.521.
[7]	ROMERA E, ALVAREZ J M, BERGASA L M, et al. ERFNet: Efficient residual factorized ConvNet for real-time semantic segmentation[J]. IEEE Transactions on Intelligent Transportation Systems, 2018, 19(1): 263–272. doi: 10.1109/TITS.2017.2750080.
[8]	WANG Yu, ZHOU Quan, XIONG Jian, et al. ESNet: An efficient symmetric network for real-time semantic segmentation[C]. Second Chinese Conference on Pattern Recognition and Computer Vision, Xi’an, China, 2019: 41–52. doi: 10.1007/978-3-030-31723-2_4.
[9]	GAO Guangwei, XU Guoan, YU Yi, et al. MSCFNet: A lightweight network with multi-scale context fusion for real-time semantic segmentation[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(12): 25489–25499. doi: 10.1109/TITS.2021.3098355.
[10]	PASZKE A, CHAURASIA A, KIM S, et al. ENet: A deep neural network architecture for real-time semantic segmentation[EB/OL].https://arxiv.org/pdf/:1606.02147.pdf, 2016.
[11]	ZHAO Hengshuang, QI Xiaojuan, SHEN Xiaoyong, et al. ICNet for real-time semantic segmentation on high-resolution images[C]. 15th European Conference on Computer Vision, Munich, Germany, 2018: 418–434. doi: 10.1007/978-3-030-01219-9_25.
[12]	WU Tianyi, TANG Sheng, ZHANG Rui, et al. CGNet: A light-weight context guided network for semantic segmentation[J]. IEEE Transactions on Image Processing, 2021, 30: 1169–1179. doi: 10.1109/TIP.2020.3042065.
[13]	LV Qingxuan, SUN Xin, CHEN Changrui, et al. Parallel complement network for real-time semantic segmentation of road scenes[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(5): 4432–4444. doi: 10.1109/TITS.2020.3044672.
[14]	黄庭鸿, 聂卓赟, 王庆国, 等. 基于区块自适应特征融合的图像实时语义分割[J]. 自动化学报, 2021, 47(5): 1137–1148. doi: 10.16383/j.aas.c180645. HUANG Tinghong, NIE Zhuoyun, WANG Qingguo, et al. Real-time image semantic segmentation based on block adaptive feature fusion[J]. Acta Automatica Sinica, 2021, 47(5): 1137–1148. doi: 10.16383/j.aas.c180645.
[15]	HU Xuegang and ZHOU Baoman. LBARNet: Lightweight bilateral asymmetric residual network for real-time semantic segmentation[J]. Computers & Graphics, 2023, 116: 1–12. doi: 10.1016/j.cag.2023.07.039.
[16]	ZHAO Hengshuang, ZHANG Yi, LIU Shu, et al. PSANet: Point-wise spatial attention network for scene parsing[C]. 15th European Conference on Computer Vision, Munich, Germany, 2018: 270–286. doi: 10.1007/978-3-030-01240-3_17.
[17]	FU Jun, LIU Jing, TIAN Haijie, et al. Dual attention network for scene segmentation[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Los Alamitos, USA, 2019: 3141–3149. doi: 10.1109/CVPR.2019.00326.
[18]	ELHASSAN M A M, HUANG Chenxi, YANG Chenhui, et al. DSANet: Dilated spatial attention for real-time semantic segmentation in urban street scenes[J]. Expert Systems with Applications, 2021, 183: 115090. doi: 10.1016/j.eswa.2021.115090.
[19]	王囡, 侯志强, 蒲磊, 等. 空洞可分离卷积和注意力机制的实时语义分割[J]. 中国图象图形学报, 2022, 27(4): 1216–1225. doi: 10.11834/jig.200729. WANG Nan, HOU Zhiqiang, PU Lei, et al. Real-time semantic segmentation analysis based on cavity separable convolution and attention mechanism[J]. Journal of Image and Graphics, 2022, 27(4): 1216–1225. doi: 10.11834/jig.200729.
[20]	高翔, 李春庚, 安居白. 基于注意力和多标签分类的图像实时语义分割[J]. 计算机辅助设计与图形学学报, 2021, 33(1): 59–67. doi: 10.3724/SP.J.1089.2021.18233. GAO Xiang, LI Chungeng, and AN Jubai. Real-time image semantic segmentation based on attention mechanism and multi-label classification[J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(1): 59–67. doi: 10.3724/SP.J.1089.2021.18233.
[21]	ARANI E, MARZBAN S, PATA A, et al. RGPNet: A real-time general purpose semantic segmentation[C]. IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, USA, 2021: 3008–3017. doi: 10.1109/WACV48630.2021.00305.
[22]	ZHOU Quan, WANG Yu, FAN Yawen, et al. AGLNet: Towards real-time semantic segmentation of self-driving images via attention-guided lightweight network[J]. Applied Soft Computing, 2020, 96: 106682. doi: 10.1016/j.asoc.2020.106682.
[23]	WANG Xiaotian and CAO Weiqun. MRFDCNet: Multireceptive field dense connection network for real-time semantic segmentation[J]. Mobile Information Systems, 2022, 2022: 6100292. doi: 10.1155/2022/6100292.
[24]	ZHAO Hengshuang, SHI Jianping, QI Xiaojuan, et al. Pyramid scene parsing network[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 2017: 6230–6239. doi: 10.1109/CVPR.2017.660.
[25]	LI Gen, JIANG Shenlu, YUN I, et al. Depth-wise asymmetric bottleneck with point-wise aggregation decoder for real-time semantic segmentation in urban scenes[J]. IEEE Access, 2020, 8: 27495–27506. doi: 10.1109/ACCESS.2020.2971760.
[26]	WANG Xizhong, LIU Rui, DONG Jing, et al. Lightweight real-time image semantic segmentation network based on multi-resolution hybrid attention mechanism[J]. Wireless Communications and Mobile Computing, 2022, 2022: 3215083. doi: 10.1155/2022/3215083.