Occluded Object Segmentation Based on Bilayer Decoupling Strategy and Attention Mechanism

LÜ Yue; ZHOU Zhequan; LÜ Shujing

doi:10.11999/JEIT211288

Volume 45 Issue 1

Jan. 2023

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2023 > 45(1): 335-343

Wu Qi-hui, Chen Yu, Zhao Chun-ming. Chip Equalizer Combined with Pilot Cancellation and Rake[J]. Journal of Electronics & Information Technology, 2005, 27(3): 380-383.

Citation:

LÜ Yue, ZHOU Zhequan, LÜ Shujing. Occluded Object Segmentation Based on Bilayer Decoupling Strategy and Attention Mechanism[J]. Journal of Electronics & Information Technology, 2023, 45(1): 335-343. doi: 10.11999/JEIT211288

Wu Qi-hui, Chen Yu, Zhao Chun-ming. Chip Equalizer Combined with Pilot Cancellation and Rake[J]. Journal of Electronics & Information Technology, 2005, 27(3): 380-383.

Citation:

PDF( 4789 KB)

Occluded Object Segmentation Based on Bilayer Decoupling Strategy and Attention Mechanism

doi: 10.11999/JEIT211288

LÜ Yue^{1, 3
,
,},
ZHOU Zhequan²,
LÜ Shujing^{1, 3}

1.
School of Communication and Electronic Engineering, East China Normal University, Shanghai 200062, China
2.
School of Computer Science and Technology, East China Normal University, Shanghai 200062, China
3.
Shanghai Key Laboratory of Multidimensional Information Processing, Shanghai 200062, China

Received Date: 2021-11-18
Accepted Date: 2022-03-01
Rev Recd Date: 2022-02-24

Available Online: 2022-03-08

Publish Date: 2023-01-17

Abstract

Abstract

Occluded object segmentation is a difficult problem in instance segmentation, but it has great practical value in many industrial applications such as stacked parcel segmentation on logistics automatic sorting. In this paper, an occluded object segmentation method based on bilayer decoupling strategy and attention mechanism is proposed to improve the segmentation performance of occluded parcels. Firstly, the image features are extracted through a backbone network with a Feature Pyramid Network (FPN); Secondly, the bilayer decoupling head is used to predict whether the mass centers of instances are occluded, and different occlusion types of instances are predicted through different branches; Thirdly, attention refinement module is used to obtain predicted masks of non-occluded instances and generate an attention map by combining these masks; Finally, this attention map is used to help the prediction of occluded instances. A dataset is provided for occluded parcel segmentation. Our method is tested on this dataset. The experimental results show that the proposed network achieves 95,66% Average Precision(AP), 97.17% Recall, and 11.78% Miss Rate(MR^–2). It indicates that this method has better segmentation performance than other methods.
- Image segmentation,
- Attention mechanism,
- Occluded object,
- Bilayer decoupling strategy

FullText(HTML)

References(22)

References

[1]	HE Kaiming, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]. The 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 2980–2988.
[2]	PENG Sida, JIANG Wen, PI Huaijin, et al. Deep snake for real-time instance segmentation[C]. The 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 8530–8539.
[3]	REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[C]. The 28th International Conference on Neural Information Processing Systems, Montreal, Canada, 2015: 91–99.
[4]	TIAN Zhi, SHEN Chunhua, and CHEN Hao. Conditional convolutions for instance segmentation[C]. The 16th European Conference on Computer Vision, Glasgow, UK, 2020: 282–298.
[5]	XIE Enze, SUN Peize, SONG Xiaoge, et al. PolarMask: Single shot instance segmentation with polar representation[C]. The 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 12190–12199.
[6]	WANG Xinlong, KONG Tao, SHEN Chunhua, et al. SOLO: Segmenting objects by locations[C]. The 16th European Conference on Computer Vision, Glasgow, UK, 2020: 649–665.
[7]	WANG Xinlong, ZHANG Rufeng, KONG Tao, et al. SOLOv2: Dynamic and fast instance segmentation[C/OL]. Advances in Neural Information Processing Systems, 2020: 17721–17732.
[8]	ZHANG Rufeng, TIAN Zhi, SHEN Chunhua, et al. Mask encoding for single shot instance segmentation[C]. The 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 10223–10232.
[9]	ZHANG Shifeng, WEN Longyin, BIAN Xiao, et al. Occlusion-aware R-CNN: Detecting pedestrians in a crowd[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 657–674.
[10]	WANG Xinlong, XIAO Tete, JIANG Yuning, et al. Repulsion loss: Detecting pedestrians in a crowd[C]. The 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 7774–7783.
[11]	BODLA N, SINGH B, CHELLAPPA R, et al. Soft-NMS—improving object detection with one line of code[C]. The 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 5562–5570.
[12]	HE Yihui, ZHU Chenchen, WANG Jianren, et al. Bounding box regression with uncertainty for accurate object detection[C]. The 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 2883–2892.
[13]	HOSANG J, BENENSON R, and SCHIELE B. Learning non-maximum suppression[C]. The 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 6469–6477.
[14]	QI Lu, LIU Shu, SHI Jianping, et al. Sequential context encoding for duplicate removal[C]. The 32nd International Conference on Neural Information Processing Systems, Montréal, Canada, 2018: 2053–2062.
[15]	HOSANG J, BENENSON R, and SCHIELE B. A convnet for non-maximum suppression[C]. The 38th German Conference on Pattern Recognition, Hannover, Germany, 2016: 192–204.
[16]	LIU Songtao, HUANG Di, and WANG Yunhong. Adaptive NMS: Refining pedestrian detection in a crowd[C]. The 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 6452–6461.
[17]	STEWART R, ANDRILUKA M, and NG A Y. End-to-end people detection in crowded scenes[C]. The 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 2325–2333.
[18]	RUKHOVICH D, SOFIIUK K, GALEEV D, et al. IterDet: Iterative scheme for object detection in crowded environments[C]. Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR), Padua, Italy, 2021: 344–354.
[19]	CHU Xuangeng, ZHENG Anlin, ZHANG Xiangyu, et al. Detection in crowded scenes: One proposal, multiple predictions[C]. The 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 12211–12220.
[20]	LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]. The 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 936–944.
[21]	TIAN Zhi, SHEN Chunhua, CHEN Hao, et al. FCOS: Fully convolutional one-stage object detection[C]. The 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 9626–9635.
[22]	DOLLAR P, WOJEK C, SCHIELE B, et al. Pedestrian detection: An evaluation of the state of the art[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(4): 743–761. doi: 10.1109/TPAMI.2011.155