Advanced Search
Volume 46 Issue 7
Jul.  2024
Turn off MathJax
Article Contents
LI Xing, FAN Yangyu, GUO Zhe, DUAN Yu, LIU Shiya. Edge Domain Adaptation for Stereo Matching[J]. Journal of Electronics & Information Technology, 2024, 46(7): 2970-2980. doi: 10.11999/JEIT231113
Citation: LI Xing, FAN Yangyu, GUO Zhe, DUAN Yu, LIU Shiya. Edge Domain Adaptation for Stereo Matching[J]. Journal of Electronics & Information Technology, 2024, 46(7): 2970-2980. doi: 10.11999/JEIT231113

Edge Domain Adaptation for Stereo Matching

doi: 10.11999/JEIT231113
Funds:  The National Natural Science Foundation of China (62071384), The Key Research and Development Project of Shaanxi Province (2023-YBGY-239), Jiangxi Natural Science Foundations (20224BAB212009)
  • Received Date: 2023-10-12
  • Rev Recd Date: 2023-12-28
  • Available Online: 2024-01-02
  • Publish Date: 2024-07-29
  • The style transfer method, due to its excellent domain adaptation capability, is widely used to alleviate domain gap of computer vision domain. Currently, stereo matching based on style transfer faces the following challenges: (1) The transformed left and right images need to remain matched; (2) The content and spatial information of the transformed images should remain consistent with the original images. To address these challenges, an Edge Domain Adaptation Stereo matching (EDA-Stereo) method is proposed. First, an Edge-guided Generative Adversarial Network (Edge-GAN) is constructed. By incorporating edge cues and synthetic features through the Spatial Feature Transform (SFT) layer. the Edge-GAN guides the generator to produce pseudo-images that retain the structural features of syntheitic domain images. Second, a warping loss is introduced to guarantee the left image to be reconstructed based on the transformed right image to approximate the original left image, preventing mismatches between the transformed left and right images. Finally, a normal loss based stetreo matching network is proposed to capture more geometric details by characterizing local depth variations, thereby improving matching accuracy. By training on synthetic datasets and comparing with various methods on real datasets, results show the effectiveness in mitigating domain gaps. On the KITTI 2012 and KITTI 2015 datasets, the D1 error is 3.9% and 4.8%, respectively, which is a relative reduction of 37% and 26% compared to the state-of-the-art Domain-invariant Stereo Matching Networks (DSM-Net) method.
  • loading
  • [1]
    HIRSCHMULLER H. Stereo processing by semiglobal matching and mutual information[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(2): 328–341. doi: 10.1109/TPAMI.2007.1166.
    [2]
    边继龙, 门朝光, 李香. 基于小基高比的快速立体匹配方法[J]. 电子与信息学报, 2012, 34(3): 517–522. doi: 10.3724/SP.J.1146.2011.00826.

    BIAN Jilong, MEN Chaoguang, and LI Xiang. A fast stereo matching method based on small baseline[J]. Journal of Electronics & Information Technology, 2012, 34(3): 517–522. doi: 10.3724/SP.J.1146.2011.00826.
    [3]
    MAYER N, ILG E, HÄUSSER P, et al. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 4040–4048. doi: 10.1109/CVPR.2016.438.
    [4]
    KENDALL A, MARTIROSYAN H, DASGUPTA S, et al. End-to-end learning of geometry and context for deep stereo regression[C]. 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 66–75. doi: 10.1109/ICCV.2017.17.
    [5]
    LI Zhaoshuo, LIU Xingtong, DRENKOW N, et al. Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers[C]. 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 6177–6186. doi: 10.1109/ICCV48922.2021.00614.
    [6]
    LIPSON L, TEED Z, and DENG Jia. RAFT-Stereo: Multilevel recurrent field transforms for stereo matching[C]. 2021 International Conference on 3D Vision, London, UK, 2021: 218–227. doi: 10.1109/3DV53792.2021.00032.
    [7]
    LI Jiankun, WANG Peisen, XIONG Pengfei, et al. Practical stereo matching via cascaded recurrent network with adaptive correlation[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 16242–16251. doi: 10.1109/CVPR52688.2022.01578.
    [8]
    RAO Zhibo, XIONG Bangshu, HE Mingyi, et al. Masked representation learning for domain generalized stereo matching[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 5435–5444. doi: 10.1109/CVPR52729.2023.00526.
    [9]
    ROS G, SELLART L, MATERZYNSKA J, et al. The SYNTHIA dataset: A large collection of synthetic images for semantic segmentation of urban scenes[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 3234–3243. doi: 10.1109/CVPR.2016.352.
    [10]
    CHANG Jiaren and CHEN Yongsheng. Pyramid stereo matching network[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 5410–5418. doi: 10.1109/CVPR.2018.00567.
    [11]
    LIU Shaolei, YIN Siqi, QU Linhao, et al. Reducing domain gap in frequency and spatial domain for cross-modality domain adaptation on medical image segmentation[C]. The 37th AAAI Conference on Artificial Intelligence, Washington, USA, 2023: 1719–1727. doi: 10.1609/aaai.v37i2.25260.
    [12]
    刘彦呈, 董张伟, 朱鹏莅, 等. 基于特征解耦的无监督水下图像增强[J]. 电子与信息学报, 2022, 44(10): 3389–3398. doi: 10.11999/JEIT211517.

    LIU Yancheng, DONG Zhangwei, ZHU Pengli, et al. Unsupervised underwater image enhancement based on feature disentanglement[J]. Journal of Electronics & Information Technology, 2022, 44(10): 3389–3398. doi: 10.11999/JEIT211517.
    [13]
    LI Xing, FAN Yangyu, LV Guoyun, et al. Area-based correlation and non-local attention network for stereo matching[J]. The Visual Computer, 2022, 38(11): 3881–3895. doi: 10.1007/s00371-021-02228-w.
    [14]
    WANG Xintao, YU Ke, DONG Chao, et al. Recovering realistic texture in image super-resolution by deep spatial feature transform[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 606–615. doi: 10.1109/CVPR.2018.00070.
    [15]
    ZHU Junyan, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]. 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 2242–2251. doi: 10.1109/ICCV.2017.244.
    [16]
    GEIGER A, LENZ P, and URTASUN R. Are we ready for autonomous driving? The KITTI vision benchmark suite[C]. 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, USA, 2012: 3354–3361. doi: 10.1109/CVPR.2012.6248074.
    [17]
    MENZE M and GEIGER A. Object scene flow for autonomous vehicles[C]. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 3061–3070. doi: 10.1109/CVPR.2015.7298925.
    [18]
    SCHARSTEIN D, HIRSCHMÜLLER H, KITAJIMA Y, et al. High-resolution stereo datasets with subpixel-accurate ground truth[C]. The 36th DAGM German Conference on Pattern Recognition, Münster, Germany, 2014: 31–42. doi: 10.1007/978-3-319-11752-2_3.
    [19]
    SCHÖPS T, SCHÖNBERGER J L, GALLIANI S, et al. A multi-view stereo benchmark with high-resolution images and multi-camera videos[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 2538–2547. doi: 10.1109/CVPR.2017.272.
    [20]
    XIE Saining and TU Zhuowen. Holistically-nested edge detection[C]. 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 1395–1403. doi: 10.1109/ICCV.2015.164.
    [21]
    GUO Xiaoyang, YANG Kai, YANG Wukui, et al. Group-wise correlation stereo network[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 3268–3277. doi: 10.1109/CVPR.2019.00339.
    [22]
    RAO Zhibo, HE Mingyi, DAI Yuchao, et al. NLCA-Net: A non-local context attention network for stereo matching[J]. APSIPA Transactions on Signal and Information Processing, 2020, 9(1): e18. doi: 10.1017/ATSIP.2020.16.
    [23]
    PASS G, ZABIH R, and MILLER J. Comparing images using color coherence vectors[C]. The Fourth ACM International Conference on Multimedia, New York, USA, 1997: 65–73. doi: 10.1145/244130.244148.
    [24]
    CHENG Xuelian, ZHONG Yiran, HARANDI M, et al. Hierarchical neural architecture search for deep stereo matching[C]. The 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020: 1858. doi: 10.5555/3495724.3497582.
    [25]
    LIANG Zhengfa, FENG Yiliu, GUO Yulan, et al. Learning for disparity estimation through feature constancy[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 2811–2820. doi: 10.1109/CVPR.2018.00297.
    [26]
    ZHANG Feihu, PRISACARIU V, YANG Ruigang, et al. GA-Net: Guided aggregation net for end-to-end stereo matching[C]. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 185–194. doi: 10.1109/CVPR.2019.00027.
    [27]
    XU Haofei and ZHANG Juyong. AANet: Adaptive aggregation network for efficient stereo matching[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 1956–1965. doi: 10.1109/CVPR42600.2020.00203.
    [28]
    HOSNI A, RHEMANN C, BLEYER M, et al. Fast cost-volume filtering for visual correspondence and beyond[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(2): 504–511. doi: 10.1109/TPAMI.2012.156.
    [29]
    BLEYER M, RHEMANN C, and ROTHER C. PatchMatch stereo-stereo matching with slanted support windows[C]. British Machine Vision Conference 2011, Dundee, UK, 2011: 1–11. doi: 10.5244/C.25.14.
    [30]
    YIN Zhichao, DARRELL T, and YU F. Hierarchical discrete distribution decomposition for match density estimation[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA: 2019: 6037–6046. doi: 10.1109/CVPR.2019.00620.
    [31]
    SONG Xiao, ZHAO Xu, FANG Liangji, et al. EdgeStereo: An effective multi-task learning network for stereo matching and edge detection[J]. International Journal of Computer Vision, 2020, 128(4): 910–930. doi: 10.1007/s11263-019-01287-w.
    [32]
    ZHANG Feihu, QI Xiaojuan, YANG Ruigang, et al. Domain-invariant stereo matching networks[C]. The 16th European Conference on Computer Vision, Glasgow, UK, 2020: 420–439. doi: 10.1007/978-3-030-58536-5_25.
    [33]
    CAI Changjiang, POGGI M, MATTOCCIA S, et al. Matching-space stereo networks for cross-domain generalization[C]. 2020 International Conference on 3D Vision, Fukuoka, Japan, 2020: 364–373. doi: 10.1109/3DV50981.2020.00046.
    [34]
    LING Zhi, YANG Kai, LI Jinlong, et al. Domain-adaptive modules for stereo matching network[J]. Neurocomputing, 2021, 461: 217–227. doi: 10.1016/j.neucom.2021.06.004.
    [35]
    LIU Rui, YANG Chengxi, SUN Wenxiu, et al. StereoGAN: Bridging synthetic-to-real domain gap by joint optimization of domain translation and stereo matching[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 12754–12763. doi: 10.1109/CVPR42600.2020.01277.
    [36]
    CHUAH Weiqin, TENNAKOON R, HOSEINNEZHAD R, et al. ITSA: An information-theoretic approach to automatic shortcut avoidance and domain generalization in stereo matching networks[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 13012–13022. doi: 10.1109/CVPR52688.2022.01268.
    [37]
    ZHANG Jiawei, WANH Xiang, BAI Xiao, et al. Revisiting domain generalized stereo matching networks from a feature consistency perspective[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 12991–13001. doi: 10.1109/CVPR52688.2022.01266.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(6)  / Tables(7)

    Article Metrics

    Article views (334) PDF downloads(45) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return