FCSNet: A Frequency-Domain Aware Cross-Feature Fusion Network for Smoke Segmentation

WANG Kaizheng; ZENG Yao; ZHANG Zhanxi; TAN Yizhang; WEN Gang

doi:10.11999/JEIT241021

Volume 47 Issue 7

Jul. 2025

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2025 > 47(7): 2320-2333

WANG Kaizheng, ZENG Yao, ZHANG Zhanxi, TAN Yizhang, WEN Gang. FCSNet: A Frequency-Domain Aware Cross-Feature Fusion Network for Smoke Segmentation[J]. Journal of Electronics & Information Technology, 2025, 47(7): 2320-2333. doi: 10.11999/JEIT241021

Citation:

WANG Kaizheng, ZENG Yao, ZHANG Zhanxi, TAN Yizhang, WEN Gang. FCSNet: A Frequency-Domain Aware Cross-Feature Fusion Network for Smoke Segmentation[J]. Journal of Electronics & Information Technology, 2025, 47(7): 2320-2333. doi: 10.11999/JEIT241021

Citation:

PDF( 4621 KB)

FCSNet: A Frequency-Domain Aware Cross-Feature Fusion Network for Smoke Segmentation

doi: 10.11999/JEIT241021 cstr: 32379.14.JEIT241021

1.
School of Faculty of Electric Power Engineering, Kunming University of Science and Technology, Kunming 650500, China
2.
Southern Power Grid Electric Vehicle Service Co., Ltd., Shenzhen 518000, China
3.
Electric Power Research Institute of Yunnan Electric Power Company, Kunming 650214, China

Funds: The National Natural Science Foundation of China (52107017), Yunnan Provincial Department of Science and Technology Basic Research Special Youth Project (202201AU070172)

Received Date: 2024-11-14
Rev Recd Date: 2025-05-28

Available Online: 2025-06-10

Publish Date: 2025-07-22

Abstract

Abstract

Objective Vision-based smoke segmentation enables pixel-level classification of smoke regions, providing more spatially detailed information than traditional bounding-box-based detection approaches. Existing segmentation models based on Deep Convolutional Neural Networks (DCNNs) demonstrate reasonable performance but remain constrained by a limited receptive field due to their local inductive bias and two-dimensional neighborhood structure. This constraint reduces their capacity to model multi-scale features, particularly in complex visual scenes with diverse contextual elements. Transformer-based architectures address long-range dependencies but exhibit reduced effectiveness in capturing local structure. Moreover, the limited availability of real-world smoke segmentation datasets and the underutilization of edge information reduce the generalization ability and accuracy of current models. To address these limitations, this study proposes a Frequency-domain aware Cross-feature fusion Network for Smoke segmentation (FCSNet), which integrates frequency-domain and spatial-domain representations to enhance multi-scale feature extraction and edge information retention. A dataset featuring various smoke types and complex backgrounds is also constructed to support model training and evaluation under realistic conditions. Methods To address the challenges of smoke semantic segmentation in real-world scenarios, this study proposes FCSNet, a frequency-domain aware cross-feature fusion network. Given the high computational cost associated with Transformer-based models, a Frequency Transformer is designed to reduce complexity while retaining global representation capability. To overcome the limited contextual modeling of DCNNs and the insufficient local feature extraction of Transformers, a Domain Interaction Module (DIM) is introduced to facilitate effective fusion of global and local information. Within the network architecture, the Frequency Transformer branch extracts low-frequency components to capture large-scale semantic structures, thereby improving global scene comprehension. In parallel, a Multi-level High-Frequency perception Module (MHFM) is combined with Multi-Head Cross Attention (MHCA). MHFM processes multi-layer encoder features to capture high-frequency edge details at full resolution using a shallow structure. MHCA then computes directional global similarity maps to guide the decoder in aggregating contextual information more effectively. Results and Discussions The effectiveness of FCSNet is evaluated through comparative experiments against state-of-the-art methods using the RealSmoke and SMOKE5K datasets. On the RealSmoke dataset, FCSNet achieves the highest segmentation accuracy, with mean Intersection over Union (mIoU) values of 58.59% on RealSmoke-1 and 63.92% on RealSmoke-2, outperforming all baseline models (Table 4). Although its FLOPs are slightly higher than those of TransFuse, FCSNet demonstrates a favorable trade-off between accuracy and computational complexity. Qualitative results further highlight its advantages under challenging conditions. In scenes affected by clouds, fog, or building occlusion, FCSNet distinguishes smoke boundaries more clearly and reduces both false positives and missed detections (Fig. 8). Notably, in RealSmoke-2, which contains fine and sparse smoke patterns, FCSNet exhibits superior performance in smoke localization and edge detail segmentation compared to other methods (Fig. 9). On the SMOKE5K dataset, FCSNet achieves an mIoU of 78.94%, showing a clear advantage over competing algorithms (Table 5). Visual comparisons also indicate that FCSNet generates more accurate and refined smoke boundaries (Fig. 10). These results confirm that FCSNet maintains strong segmentation accuracy and robustness across diverse real-world scenes, supporting its generalizability and practical utility in smoke detection tasks. Conclusions To address the challenges of smoke semantic segmentation in real-world environments, this study proposes FCSNet, a network that integrates frequency- and spatial-domain information. A Frequency Transformer is introduced to reduce computational cost while enhancing global semantic modeling through low-frequency feature extraction. To compensate for the limited receptive field of DCNNs and the local feature insensitivity of Transformers, a DIM is designed to fuse global and local representations. An MHFM is employed to extract edge features, improving segmentation performance in ambiguous regions. Additionally, an MHCA mechanism aligns high-frequency edge features with decoder representations to guide segmentation in visually confusing areas. By jointly leveraging low-frequency semantics and high-frequency detail, FCSNet achieves effective fusion of contextual and structural information. Extensive quantitative and qualitative evaluations confirm that FCSNet performs robustly under complex interference conditions, including clouds, fog, and occlusions, enabling accurate smoke localization and fine-grained segmentation.
- Smoke semantic segmentation,
- Frequency Transformer,
- Domain interaction,
- Multi-level high-frequency perception,
- Multi-Head Cross Attention (MHCA)

FullText(HTML)

References(33)

References

[1]	王开正, 周顺珍, 王健, 等. 基于多尺度时空特征深度融合神经网络的输电线路火点判识方法[J]. 高电压技术, 2025, 51(3): 1145–1157. doi: 10.13336/j.1003-6520.hve.20240086. WANG Kaizheng, ZHOU Shunzhen, WANG Jian, et al. Wildfire identification method for transmission lines based on deep fusion neural network with multi-scale spatio-temporal features[J]. High Voltage Engineering, 2025, 51(3): 1145–1157. doi: 10.13336/j.1003-6520.hve.20240086.
[2]	杜辰, 王兴, 董增寿, 等. 改进YOLOv5s的地下车库火焰烟雾检测方法[J]. 计算机工程与应用, 2024, 60(11): 298–308. doi: 10.3778/j.issn.1002-8331.2307-0003. DU Chen, WANG Xing, DONG Zengshou, et al. Improved YOLOv5s flame and smoke detection method for underground garage[J]. Computer Engineering and Applications, 2024, 60(11): 298–308. doi: 10.3778/j.issn.1002-8331.2307-0003.
[3]	张欣雨, 梁煜, 张为. 融合全局和局部信息的实时烟雾分割算法[J]. 西安电子科技大学学报, 2024, 51(1): 147–156. doi: 10.19665/j.issn1001-2400.20230405. ZHANG Xinyu, LIANG Yu, and ZHANG Wei. Real-time smoke segmentation algorithm combining global and local information[J]. Journal of Xidian University, 2024, 51(1): 147–156. doi: 10.19665/j.issn1001-2400.20230405.
[4]	CAO Hu, WANG Yueyue, CHEN J, et al. Swin-unet: Unet-like pure transformer for medical image segmentation[C]. The 17th European Conference on Computer Vision, Tel Aviv, Israel, 2022: 205–218. doi: 10.1007/978-3-031-25066-8_9.
[5]	JIANG Huiyan, DIAO Zhaoshuo, SHI Tianyu, et al. A review of deep learning-based multiple-lesion recognition from medical images: Classification, detection and segmentation[J]. Computers in Biology and Medicine, 2023, 157: 106726. doi: 10.1016/j.compbiomed.2023.106726.
[6]	MIAH M S U, KABIR M M, SARWAR T B, et al. A multimodal approach to cross-lingual sentiment analysis with ensemble of transformer and LLM[J]. Scientific Reports, 2024, 14(1): 9603. doi: 10.1038/s41598-024-60210-7.
[7]	TANG M C S, TING K C, and RASHIDI N H. DenseNet201-based waste material classification using transfer learning approach[J]. Applied Mathematics and Computational Intelligence, 2024, 13(2): 113–120. doi: 10.58915/amci.v13i2.555.
[8]	MIN Hai, ZHANG Yemao, ZHAO Yang, et al. Hybrid feature enhancement network for few-shot semantic segmentation[J]. Pattern Recognition, 2023, 137: 109291. doi: 10.1016/j.patcog.2022.109291.
[9]	CHENG Huixian, HAN Xianfeng, and XIAO Guoqiang. TransRVNet: LiDAR semantic segmentation with transformer[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(6): 5895–5907. doi: 10.1109/TITS.2023.3248117.
[10]	GROSSBERG S. Recurrent neural networks[J]. Scholarpedia, 2013, 8(2): 1888. doi: 10.4249/scholarpedia.1888.
[11]	ZAGORUYKO S and KOMODAKIS N. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer[C]. 5th International Conference on Learning Representations, Toulon, France, 2017.
[12]	KHAN S, MUHAMMAD K, HUSSAIN T, et al. DeepSmoke: Deep learning model for smoke detection and segmentation in outdoor environments[J]. Expert Systems with Applications, 2021, 182: 115125. doi: 10.1016/j.eswa.2021.115125.
[13]	HUANG Yonghao, CHEN Leiting, ZHOU Chuan, et al. Model long-range dependencies for multi-modality and multi-view retinopathy diagnosis through transformers[J]. Knowledge-Based Systems, 2023, 271: 110544. doi: 10.1016/j.knosys.2023.110544.
[14]	HUTCHINS D, SCHLAG I, WU Yuhuai, et al. Block-recurrent transformers[C]. The 36th International Conference on Neural Information Processing Systems, New Orleans, USA, 2022: 2409.
[15]	GAO Mingyu, QI Dawei, MU Hongbo, et al. A transfer residual neural network based on ResNet-34 for detection of wood knot defects[J]. Forests, 2021, 12(2): 212. doi: 10.3390/f12020212.
[16]	LI Xiuqing, CHEN Zhenxue, WU Q M J, et al. 3D parallel fully convolutional networks for real-time video wildfire smoke detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30(1): 89–103. doi: 10.1109/TCSVT.2018.2889193.
[17]	张俊鹏, 刘辉, 李清荣. 基于FCN-LSTM的工业烟尘图像分割[J]. 计算机工程与科学, 2021, 43(5): 907–916. doi: 10.3969/j.issn.1007-130X.2021.05.018. ZHANG Junpeng, LIU Hui, and LI Qingrong. An industrial smoke image segmentation method based on FCN-LSTM[J]. Computer Engineering & Science, 2021, 43(5): 907–916. doi: 10.3969/j.issn.1007-130X.2021.05.018.
[18]	YUAN Feiniu, ZHANG Lin, XIA Xue, et al. Deep smoke segmentation[J]. Neurocomputing, 2019, 357: 248–260. doi: 10.1016/j.neucom.2019.05.011.
[19]	YUAN Feiniu, ZHANG Lin, XIA Xue, et al. A wave-shaped deep neural network for smoke density estimation[J]. IEEE Transactions on Image Processing, 2020, 29: 2301–2313. doi: 10.1109/TIP.2019.2946126.
[20]	TAO Huanjie and DUAN Qianyue. Learning discriminative feature representation for estimating smoke density of smoky vehicle rear[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(12): 23136–23147. doi: 10.1109/TITS.2022.3198047.
[21]	YAN Siyuan, ZHANG Jing, and BARNES N. Transmission-guided Bayesian generative model for smoke segmentation[C]. Proceedings of the 36th AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2022: 3009–3017. doi: 10.1609/aaai.v36i3.20207.
[22]	HE Qiqi, YANG Qiuju, and XIE Minghao. HCTNet: A hybrid CNN-transformer network for breast ultrasound image segmentation[J]. Computers in Biology and Medicine, 2023, 155: 106629. doi: 10.1016/j.compbiomed.2023.106629.
[23]	CHEN Jieneng, LU Yongyi, YU Qihang, et al. TransUNet: Transformers make strong encoders for medical image segmentation[J]. arXiv preprint arXiv, 2021: 2102.04306. doi: 10.48550/arXiv.2102.04306.
[24]	GHOSH R and BOVOLO F. An FFT-based CNN-transformer encoder for semantic segmentation of radar sounder signal[C]. Proceedings of SPIE 12267, Image and Signal Processing for Remote Sensing XXVIII, Berlin, Germany, 2022: 122670R. doi: 10.1117/12.2636693.
[25]	LABBIHI I, EL MESLOUHI O, BENADDY M, et al. Combining frequency transformer and CNNs for medical image segmentation[J]. Multimedia Tools and Applications, 2024, 83(7): 21197–21212. doi: 10.1007/s11042-023-16279-9.
[26]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[C]. The 9th International Conference on Learning Representations, 2021.
[27]	ZHENG Sixiao, LU Jiachen, ZHAO Hengshuang, et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 6877–6886. doi: 10.1109/CVPR46437.2021.00681.
[28]	SUN Yu, ZHI Xiyang, JIANG Shikai, et al. Image fusion for the novelty rotating synthetic aperture system based on vision transformer[J]. Information Fusion, 2024, 104: 102163. doi: 10.1016/j.inffus.2023.102163.
[29]	HUANG Zilong, WANG Xinggang, HUANG Lichao, et al. CCNet: Criss-cross attention for semantic segmentation[C]. The IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 603–612. doi: 10.1109/ICCV.2019.00069.
[30]	WANG Xiaolong, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 7794–7803. doi: 10.1109/CVPR.2018.00813.
[31]	LONG J, SHELHAMER E, and DARRELL T. Fully convolutional networks for semantic segmentation[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 3431–3440. doi: 10.1109/CVPR.2015.7298965.
[32]	WEN Gang, ZHOU Fangrong, MA Yutang, et al. A dense multi-scale context and asymmetric pooling embedding network for smoke segmentation[J]. IET Computer Vision, 2024, 18(2): 236–246. doi: 10.1049/cvi2.12246.
[33]	ZHANG Yundong, LIU Huiye, and HU Qiang. TransFuse: Fusing transformers and CNNs for medical image segmentation[C]. The 24th International Conference on Medical Image Computing and Computer Assisted Intervention, Strasbourg, France, 2021: 14–24. doi: 10.1007/978-3-030-87193-2_2.