Depth Estimation Based on Semantic Guidance for Light Field Image

DENG Huiping; SHENG Zhichao; XIANG Sen; WU Jing

doi:10.11999/JEIT210545

Volume 44 Issue 8

Aug. 2022

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2022 > 44(8): 2940-2948

DENG Huiping, SHENG Zhichao, XIANG Sen, WU Jing. Depth Estimation Based on Semantic Guidance for Light Field Image[J]. Journal of Electronics & Information Technology, 2022, 44(8): 2940-2948. doi: 10.11999/JEIT210545

Citation:

DENG Huiping, SHENG Zhichao, XIANG Sen, WU Jing. Depth Estimation Based on Semantic Guidance for Light Field Image[J]. Journal of Electronics & Information Technology, 2022, 44(8): 2940-2948. doi: 10.11999/JEIT210545

Citation:

PDF( 11053 KB)

Depth Estimation Based on Semantic Guidance for Light Field Image

doi: 10.11999/JEIT210545

1.
School of Information Science and Engineering, Wuhan University of Science and Technology, Wuhan 430081, China
2.
Engineering Research Center for Metallurgical Automation and Measurement Technology of Ministry of Education, Wuhan University of Science and Technology, Wuhan 430081, China

Received Date: 2021-06-08
Accepted Date: 2022-03-15
Rev Recd Date: 2022-03-10

Available Online: 2022-03-21

Publish Date: 2022-08-17

Abstract

Abstract

Light Field Depth Estimation(LFDE) is critical to the related applications such as 3D reconstruction, automatic driving and object tracking. However, the existing depth learning-based methods bring details lost on the edge, weak texture and other complex areas, because of ignoring the geometric characteristics of the light field image in the learning network. This paper proposes a semantic guided LFDE network, which utilizes contextual information of light field images to solve ill posed problems in complex regions. Encoder-decoder structure of semantic perception module is designed to reconstruct the spatial information for better obtaining the object boundary. The spatial pyramid pooling structure uses the atrous convolution to increase the receptive field and capture the multi-scale contextual information. Then, an adaptive local cross-channel interaction feature attention module without dimensionality reduction is used to eliminate information redundancy, and multi-channels are effectively fused. Finally, the stacked hourglass is introduced to connect multiple hourglass modules in series, and more rich context information is obtained by using the encoder-decoder structure. The experimental results on 4D light field dataset new HCI demonstrate that the proposed method has higher accuracy and generalization ability, which is superior to the depth estimation method compared, and retains better edge details.
- Light field image,
- Depth estimation,
- Semantic perception,
- Attention mechanism

FullText(HTML)

References(21)

References

[1]	MENG N, LI K, LIU J Z, et al. Light field view synthesis via aperture disparity and warping confidence map[J]. IEEE Transactions on Image Processing, 2021, 30: 3908–3921. doi: 10.1109/TIP.2021.3066293
[2]	ZHANG M, JI W, PIAO Y R, et al. LFNet: Light field fusion network for salient object detection[J]. IEEE Transactions on Image Processing, 2020, 29: 6276–6287. doi: 10.1109/TIP.2020.2990341
[3]	LI X, YANG Y B, ZHAO Q J, et al. Spatial pyramid based graph reasoning for semantic segmentation[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 8947–8956.
[4]	武迎春, 王玉梅, 王安红, 等. 基于边缘增强引导滤波的光场全聚焦图像融合[J]. 电子与信息学报, 2020, 42(9): 2293–2301. doi: 10.11999/JEIT190723 WU Yingchun ,WANG Yumei, WANG Anhong. Light field all-in-focus image fusion based on edge enhanced guided filtering[J]. Journal of Electronics &Information Technology, 2020, 42(9): 2293–2301. doi: 10.11999/JEIT190723
[5]	JEON H G, PARK J, CHOE G, et al. Accurate depth map estimation from a lenslet light field camera[C]. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA, 2015: 1547–1555.
[6]	CHEN C, LIN H T, YU Z, et al. Light field stereo matching using bilateral statistics of surface cameras[C]. 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 1518–1525.
[7]	WANNER S and GOLDLUECKE B. Globally consistent depth labeling of 4D light field[C]. 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, USA, 2012: 41–48.
[8]	ZHANG S, SHENG H, LI C, et al. Robust depth estimation for light field via spinning parallelogram operator[J]. Computer Vision and Image Understanding, 2016, 145: 148–159. doi: 10.1016/j.cviu.2015.12.007
[9]	TAO M W, SRINIVASAN P P, MALIK J, et al. Depth from shading, defocus, and correspondence using light-field angular coherence[C]. 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 1940–1948.
[10]	WANG T C, EFROS A A, and RAMAMOORTHI R. Occlusion-aware depth estimation using light-field cameras[C]. 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 3487–3495.
[11]	WILLIEM W and PARK I K. Robust light field depth estimation for noisy scene with occlusion[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 4396–4404.
[12]	HEBER S, YU W, and POCK T. Neural EPI-volume networks for shape from light field[C]. 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 2271–2279.
[13]	LUO Y X, ZHOU W H, FANG J P, et al. EPI-patch based convolutional neural network for depth estimation on 4D light field[C]. 24th International Conference on Neural Information Processing, Guangzhou, China, 2017: 642–652.
[14]	SHIN C, JEON H G, YOON Y, et al. EPINET: A fully-convolutional neural network using epipolar geometry for depth from light field images[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Lake City, USA, 2018: 4748–4757.
[15]	TSAI Y J, LIU Y L, OUHYOUNG M, et al. Attention-Based view selection networks for light-field disparity estimation[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 12095–12103. doi: 10.1609/AAAI.v34i07.6888
[16]	ZHOU W H, ZHOU E C, YAN Y X, et al. Learning depth cues from focal stack for light field depth estimation[C]. 2019 IEEE International Conference on Image Processing, Taipei, China, 2019: 1074–1078.
[17]	SHI J L, JIANG X R, and GUILLEMOT C. A framework for learning depth from a flexible subset of dense and sparse light field views[J]. IEEE Transactions on Image Processing, 2019, 28(12): 5867–5880. doi: 10.1109/TIP.2019.2923323
[18]	GUO C L, JIN J, HOU J H, et al. Accurate light field depth estimation via an occlusion-aware network[C]. 2020 IEEE International Conference on Multimedia and Expo, London, UK, 2020: 1–6.
[19]	HU J, SHEN L, and SUN G. Squeeze-and-excitation networks[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 7132–7141.
[20]	YE J W, WANG X C, JI Y X, et al. Amalgamating filtered knowledge: Learning task-customized student from multi-task teachers[C]. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, Macao, China, 2019: 4128–4134.
[21]	HONAUER K, JOHANNSEN O, KONDERMANN D, et al. A dataset and evaluation methodology for depth estimation on 4D light fields[C]. 13th Asian Conference on Computer Vision, Taipei, China, 2016: 19–34.