Weakly Supervised Image Semantic Segmentation Based on Multi-Seeded Information Aggregation and Positive-Negative Hybrid Learning

SANG Yu; LIU Tong; MA Tianjiao; LI Le; LI Siman; LIU Yunan

doi:10.11999/JEIT250112

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2025 >

SANG Yu, LIU Tong, MA Tianjiao, LI Le, LI Siman, LIU Yunan. Weakly Supervised Image Semantic Segmentation Based on Multi-Seeded Information Aggregation and Positive-Negative Hybrid Learning[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250112

Citation:

SANG Yu, LIU Tong, MA Tianjiao, LI Le, LI Siman, LIU Yunan. Weakly Supervised Image Semantic Segmentation Based on Multi-Seeded Information Aggregation and Positive-Negative Hybrid Learning[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250112

Citation:

SANG Yu, LIU Tong, MA Tianjiao, LI Le, LI Siman, LIU Yunan. Weakly Supervised Image Semantic Segmentation Based on Multi-Seeded Information Aggregation and Positive-Negative Hybrid Learning[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250112

PDF( 2801 KB)

Weakly Supervised Image Semantic Segmentation Based on Multi-Seeded Information Aggregation and Positive-Negative Hybrid Learning

doi: 10.11999/JEIT250112 cstr: 32379.14.JEIT250112

SANG Yu^{1, 2
,
,},
LIU Tong¹,
MA Tianjiao¹,
LI Le¹,
LI Siman¹,
LIU Yunan³

1.
College of Electronics and Information Engineering, Liaoning Technical University, Huludao 125105, China
2.
School of Mathematical Sciences, Bohai University, Jinzhou 121013, China
3.
College of Artificial Intelligence, Dalian Maritime University, Dalian 116026, China

Funds: National Natural Science Foundation of China (62372077, 62302249), China Postdoctoral Science Foundation (2022M720624), The Research Fund of Liaoning Provincial Education Department (LJKQZ2021152), National Key Research and Development Program of China (2019YFB2102400), University Talent Introduction Foundation(18-1021)

Received Date: 2025-02-26
Rev Recd Date: 2025-08-20

Available Online: 2025-08-27

Abstract

Abstract

Objective The rapid development of deep learning techniques, particularly Convolutional Neural Networks (CNN), has led to notable advances in semantic segmentation, enabling applications in medical imaging, autonomous driving, and remote sensing. However, conventional semantic segmentation tasks typically rely on large numbers of pixel-level annotated images, which is both time-consuming and expensive. To address this limitation, Weakly Supervised Semantic Segmentation (WSSS) using image-level labels has emerged as a promising alternative. This approach aims to reduce annotation costs while maintaining or enhancing segmentation performance, thus supporting broader adoption of semantic segmentation techniques. Most existing methods focus on optimizing Class Activation Mapping (CAM) to generate high-quality seed regions, with further refinement through post-processing. However, the resulting seed labels often contain varying degrees of noise. To mitigate the effect of noisy labels on the segmentation network and to efficiently extract accurate information by leveraging multiple complementary seed sources, this study proposes a weakly supervised semantic segmentation method based on multi-seed information aggregation and positive-negative hybrid learning. The proposed approach improves segmentation performance by integrating complementary information from different seeds while reducing noise interference. Methods Building on the idea that combining multiple seeds can effectively extract accurate information, this study proposes a weakly supervised image semantic segmentation method based on multi-seed information aggregation and positive-negative hybrid learning. The approach employs a generalized classification network to generate diverse seed regions by varying the input image scale and modifying the Dropout layer to randomly deactivate neurons with different probabilities. This process enables the extraction of complementary information from multiple sources. Subsequently, a semantic segmentation network is trained using a hybrid positive-negative learning strategy based on the category labels assigned to each pixel across these seeds. Clean labels, identified with high confidence, guide the segmentation network through a positive learning process, where the model learns that “the input image belongs to its assigned labels.” Conversely, noisy labels are addressed using two complementary strategies. Labels determined as incorrect are trained under the principle that “the input image does not belong to its assigned labels,” representing a form of positive learning for error suppression. Additionally, an indirect negative learning strategy is applied, whereby the network learns that “the input image does not belong to its complementary labels,”further mitigating the influence of noisy labels. To reduce the adverse effects of noisy labels, particularly the tendency of conventional cross-entropy loss to assign higher prediction confidence to such labels, a prediction constraint loss is introduced. This loss function enhances the model’s predictive accuracy for reliable labels while reducing overfitting to incorrect labels. The overall framework effectively suppresses noise interference and improves the segmentation network’s performance. Results and Discussions The proposed weakly supervised image semantic segmentation method based on multi-seed information aggregation and positive-negative hybrid learning generates diverse seeds by randomly varying the Dropout probability and input image scale, with Conditional Random Field (CRF) optimization applied to further refine seed quality. To limit noise introduction while maintaining the effectiveness of positive-negative hybrid learning, six complementary seeds are selected (Table 5). The integration of multi-source information from these seeds enhances segmentation performance, as demonstrated in (Table 7) . Pixel labels within these seeds are classified as clean or noisy based on a defined confidence threshold. The segmentation network is subsequently trained using a positive-negative hybrid learning strategy, which suppresses the influence of noisy labels and improves segmentation accuracy. Experimental results confirm that positive-negative hybrid learning effectively reduces label noise and enhances segmentation performance (Table 8). The proposed method was validated on the PASCAL VOC 2012 and MS COCO 2014 datasets. With a CNN-based segmentation network, the mean Intersection over Union (mIoU) reached 72.5% and 40.8%, respectively. When using a Transformer-based segmentation network, the mIoU improved to 76.8% and 46.7% (Table 1, Table 3). These results demonstrate that the proposed method effectively enhances segmentation accuracy while controlling the influence of noisy labels. Conclusions This study addresses the challenge of inaccurate seed labels in WSSS based on image-level annotations by proposing a multi-seed label differentiation strategy that leverages complementary information to improve seed quality. In addition, a positive-negative hybrid learning approach is introduced to enhance segmentation performance and mitigate the influence of erroneous pixel labels on the segmentation model. The proposed method achieves competitive results on the PASCAL VOC 2012 and MS COCO 2014 datasets. Specifically, the mIoU reaches 72.5% and 40.8%, respectively, using a CNN-based segmentation network. With a Transformer-based segmentation network, the mIoU further improves to 76.8% and 46.7%. These results demonstrate the effectiveness of the proposed method in improving segmentation accuracy while reducing noise interference. Although the method does not yet achieve ideal label precision, label differentiation combined with positive-negative hybrid learning effectively suppresses misinformation propagation and outperforms approaches based on single-seed generation and positive learning alone.
- Weakly supervised semantic segmentation,
- Class Activation Mapping (CAM),
- Multi-seed strategy,
- Cross-Entropy,
- Positive-Negative learning

FullText(HTML)

References(46)

References

[1]	LIU Huan, LI Wei, XIA Xianggen, et al. SegHSI: Semantic segmentation of hyperspectral images with limited labeled pixels[J]. IEEE Transactions on Image Processing, 2024, 33: 6469–6482. doi: 10.1109/TIP.2024.3492724.
[2]	张印辉, 张金凯, 何自芬, 等. 全局感知与稀疏特征关联图像级弱监督病理图像分割[J]. 电子与信息学报, 2024, 46(9): 3672–3682. doi: 10.11999/JEIT240364. ZHANG Yinhui, ZHANG Jinkai, HE Zifen, et al. Global perception and sparse feature associate image-level weakly supervised pathological image segmentation[J]. Journal of Electronics & Information Technology, 2024, 46(9): 3672–3682. doi: 10.11999/JEIT240364.
[3]	LI Jiale, DAI Hang, HAN Hao, et al. MSeg3D: Multi-modal 3D semantic segmentation for autonomous driving[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 21694–21704. doi: 10.1109/CVPR52729.2023.02078.
[4]	梁燕, 易春霞, 王光宇, 等. 基于多尺度语义编解码网络的遥感图像语义分割[J]. 电子学报, 2023, 51(11): 3199–3214. doi: 10.12263/DZXB.20220503. LIANG Yan, YI Chunxia, WANG Guangyu, et al. Semantic segmentation of remote sensing image based on multi-scale semantic encoder-decoder network[J]. Acta Electronica Sinica, 2023, 51(11): 3199–3214. doi: 10.12263/DZXB.20220503.
[5]	OH Y, KIM B, and HAM B. Background-aware pooling and noise-aware loss for weakly-supervised semantic segmentation[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 6909–6918. doi: 10.1109/CVPR46437.2021.00684.
[6]	LIANG Zhiyuan, WANG Tiancai, ZHANG Xiangyu, et al. Tree energy loss: Towards sparsely annotated semantic segmentation[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 16886–16895. doi: 10.1109/CVPR52688.2022.01640.
[7]	ZHAO Yuanhao, SUN Genyun, LING Ziyan, et al. Point-based weakly supervised deep learning for semantic segmentation of remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5638416. doi: 10.1109/TGRS.2024.3409903.
[8]	KWEON H, YOON S H, KIM H, et al. Unlocking the potential of ordinary classifier: Class-specific adversarial erasing framework for weakly supervised semantic segmentation[C]. IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 6974–6983. doi: 10.1109/ICCV48922.2021.00691.
[9]	ZHOU Bolei, KHOSLA A, LAPEDRIZA A, et al. Learning deep features for discriminative localization[C]. IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 2921–2929. doi: 10.1109/CVPR.2016.319.
[10]	WANG Xiang, YOU Shaodi, LI Xi, et al. Weakly-supervised semantic segmentation by iteratively mining common object features[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 1354–1362. doi: 10.1109/CVPR.2018.00147.
[11]	WANG Xun, ZHANG Haozhi, HUANG Weilin, et al. Cross-batch memory for embedding learning[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 6387–6396. doi: 10.1109/CVPR42600.2020.00642.
[12]	LEE S, LEE M, LEE J, et al. Railroad is not a train: Saliency as pseudo-pixel supervision for weakly supervised semantic segmentation[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 5491–5501. doi: 10.1109/CVPR46437.2021.00545.
[13]	LEE J, OH S J, YUN S, et al. Weakly supervised semantic segmentation using out-of-distribution data[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 16876–16885. doi: 10.1109/CVPR52688.2022.01639.
[14]	CHANG Yuting, WANG Qiaosong, HUNG W C, et al. Weakly-supervised semantic segmentation via sub-category exploration[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 8988–8997. doi: 10.1109/CVPR42600.2020.00901.
[15]	ARPIT D, JASTRZĘBSKI S, BALLAS N, et al. A closer look at memorization in deep networks[C]. 34th International Conference on Machine Learning, Sydney, Australia, 2017: 233–242.
[16]	CHEN Tao, YAO Yazhou, and TANG Jinhui. Multi-granularity denoising and bidirectional alignment for weakly supervised semantic segmentation[J]. IEEE Transactions on Image Processing, 2023, 32: 2960–2971. doi: 10.1109/TIP.2023.3275913.
[17]	RONG Shenghai, TU Bohai, WANG Zilei, et al. Boundary-enhanced co-training for weakly supervised semantic segmentation[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 19574–19584. doi: 10.1109/CVPR52729.2023.01875.
[18]	WU Zifeng, SHEN Chunhua, and VAN DEN HENGEL A. Wider or deeper: Revisiting the ResNet model for visual recognition[J]. Pattern Recognition, 2019, 90: 119–133. doi: 10.1016/j.patcog.2019.01.006.
[19]	SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout: A simple way to prevent neural networks from overfitting[J]. The Journal of Machine Learning Research, 2014, 15(1): 1929–1958.
[20]	LI Mingjia, XIE Binhui, LI Shuang, et al. VBLC: Visibility boosting and logit-constraint learning for domain adaptive semantic segmentation under adverse conditions[C]. 37th AAAI Conference on Artificial Intelligence, Washington, USA, 2023: 8605–8613. doi: 10.1609/aaai.v37i7.26036.
[21]	YANG Guoqing, ZHU Chuang, and ZHANG Yu. A self-training framework based on multi-scale attention fusion for weakly supervised semantic segmentation[C]. IEEE International Conference on Multimedia and Expo, Brisbane, Australia, 2023: 876–881. doi: 10.1109/ICME55011.2023.00155.
[22]	KRÄHENBÜHL P and KOLTUN V. Efficient inference in fully connected CRFs with gaussian edge potentials[C]. 25th International Conference on Neural Information Processing Systems, Granada, Spain, 2011: 109–117.
[23]	LEE J, KIM E, LEE S, et al. FickleNet: Weakly and semi-supervised semantic image segmentation using stochastic inference[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 5262–5271. doi: 10.1109/CVPR.2019.00541.
[24]	KIM Y, YIM J, YUN J, et al. NLNL: Negative learning for noisy labels[C]. IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 101–110. doi: 10.1109/ICCV.2019.00019.
[25]	KIM Y, YUN J, SHON H, et al. Joint negative and positive learning for noisy labels[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 9437–9446. doi: 10.1109/CVPR46437.2021.00932.
[26]	EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The PASCAL visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303–338. doi: 10.1007/s11263-009-0275-4.
[27]	LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context[C]. 13th European Conference on Computer Vision, Zurich, Switzerland, 2014: 740–755. doi: 10.1007/978-3-319-10602-1_48.
[28]	HARIHARAN B, ARBELÁEZ P, BOURDEV L, et al. Semantic contours from inverse detectors[C]. International Conference on Computer Vision, Barcelon, Spain, 2011: 991–998. doi: 10.1109/ICCV.2011.6126343.
[29]	SHELHAMER E, LONG J, and DARRELL T. Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640–651. doi: 10.1109/TPAMI.2016.2572683.
[30]	DENG Jia, DONG Wei, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]. IEEE Conference on Computer Vision and Pattern Recognition, Miami, USA, 2009: 248–255. doi: 10.1109/CVPR.2009.5206848.
[31]	CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs[C]. 3rd International Conference on Learning Representations, San Diego, USA, 2015: 24–37.
[32]	CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834–848. doi: 10.1109/TPAMI.2017.2699184.
[33]	CHEN Zhaozheng, WANG Tan, WU Xiongwei, et al. Class re-activation maps for weakly-supervised semantic segmentation[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 959–968. doi: 10.1109/CVPR52688.2022.00104.
[34]	ZHOU Tianfei, ZHANG Meijie, ZHAO Fang, et al. Regional semantic contrast and aggregation for weakly supervised semantic segmentation[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 4289–4299. doi: 10.1109/CVPR52688.2022.00426.
[35]	HAN W, KANG S, CHOO K, et al. Complementary branch fusing class and semantic knowledge for robust weakly supervised semantic segmentation[J]. Pattern Recognition, 2025, 157: 110922. doi: 10.1016/j.patcog.2024.110922.
[36]	CHEN Qi, YANG Lingxiao, LAI Jianhuang, et al. Self-supervised image-specific prototype exploration for weakly supervised semantic segmentation[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 4278–4288. doi: 10.1109/CVPR52688.2022.00425.
[37]	CHEN Liyi, LEI Chenyang, LI Ruihuang, et al. FPR: False positive rectification for weakly supervised semantic segmentation[C]. IEEE/CVF International Conference on Computer Vision, Paris, France, 2023: 1108–1118. doi: 10.1109/ICCV51070.2023.00108.
[38]	CHEN L C, ZHU Yukun, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]. 15th European Conference on Computer Vision, Munich, Germany, 2018: 833–851. doi: 10.1007/978-3-030-01234-2_49.
[39]	XIE Enze, WANG Wenhai, YU Zhiding, et al. SegFormer: Simple and efficient design for semantic segmentation with transformers[C]. 35th International Conference on Neural Information Processing Systems, 2021: 924.
[40]	LIU Ze, LIN Yutong, CAO Yue, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]. IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 9992–10002. doi: 10.1109/ICCV48922.2021.00986.
[41]	CHENG Bowen, MISRA I, SCHWING A G, et al. Masked-attention mask transformer for universal image segmentation[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 1280–1289. doi: 10.1109/CVPR52688.2022.00135.
[42]	LIU Sheng, LIU Kangning, ZHU Weicheng, et al. Adaptive early-learning correction for segmentation from noisy annotations[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 2596–2606. doi: 10.1109/CVPR52688.2022.00263.
[43]	LEE M, LEE S, LEE J, et al. Saliency as pseudo-pixel supervision for weakly and semi-supervised semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(10): 12341–12357. doi: 10.1109/TPAMI.2023.3273592.
[44]	LI Yi, DUAN Yiqun, KUANG Zhanghui, et al. Uncertainty estimation via response scaling for pseudo-mask noise mitigation in weakly-supervised semantic segmentation[C]. 36th AAAI Conference on Artificial Intelligence, Palo Alto, 2022: 1447–1455. doi: 10.1609/aaai.v36i2.20034.
[45]	WU Yuanchen, LI Xiaoqiang, LI Jide, et al. DINO is also a semantic guider: Exploiting class-aware affinity for weakly supervised semantic segmentation[C]. 32nd ACM International Conference on Multimedia, Melbourne, Australia, 2024: 1389–1397. doi: 10.1145/3664647.3681710.
[46]	XU Rongtao, WANG Changwei, SUN Jiaxi, et al. Self correspondence distillation for end-to-end weakly-supervised semantic segmentation[C]. 37th AAAI Conference on Artificial Intelligence, Washington, USA, 2023: 3045–3053. doi: 10.1609/aaai.v37i3.25408.