高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

视觉光流计算技术及其应用

崔毅博 汤仁东 邢大军 王隽 李尚生

崔毅博, 汤仁东, 邢大军, 王隽, 李尚生. 视觉光流计算技术及其应用[J]. 电子与信息学报, 2023, 45(8): 2710-2721. doi: 10.11999/JEIT221418
引用本文: 崔毅博, 汤仁东, 邢大军, 王隽, 李尚生. 视觉光流计算技术及其应用[J]. 电子与信息学报, 2023, 45(8): 2710-2721. doi: 10.11999/JEIT221418
CUI Yibo, TANG Rendong, XING Dajun, WANG Juan, LI Shangsheng. Visual Optical Flow Computing: Algorithms and Applications[J]. Journal of Electronics & Information Technology, 2023, 45(8): 2710-2721. doi: 10.11999/JEIT221418
Citation: CUI Yibo, TANG Rendong, XING Dajun, WANG Juan, LI Shangsheng. Visual Optical Flow Computing: Algorithms and Applications[J]. Journal of Electronics & Information Technology, 2023, 45(8): 2710-2721. doi: 10.11999/JEIT221418

视觉光流计算技术及其应用

doi: 10.11999/JEIT221418
详细信息
    作者简介:

    崔毅博:男,博士生,研究方向目标检测与跟踪、运动感知等

    汤仁东:男,博士,副研究员,研究方向为非人灵长类视觉神经机制解析

    邢大军:男,博士,研究员,研究方向为视觉认知与计算

    王隽:女,博士,副研究员,研究方向为图像质量增强、视觉目标检测和强化学习

    李尚生:男,硕士,教授,研究方向为雷达成像等

    通讯作者:

    李尚生 1164322995@qq.com

  • 中图分类号: TN919.8; TP391.41

Visual Optical Flow Computing: Algorithms and Applications

  • 摘要: 视觉光流计算是计算机视觉从处理2维图像走向加工3维视频的重要技术手段,是描述视觉运动信息的主要方式。光流计算技术已经发展了较长的时间,随着相关技术尤其是深度学习技术在近些年的迅速发展,光流计算的性能得到了极大提升,但仍然有大量的局限性问题没有解决,准确、快速且稳健的光流计算目前仍然是一个有挑战性的研究领域和业内研究热点。光流计算作为一种低层视觉信息处理技术,其技术进展也将有助于相关中高层级视觉任务的实现。该文主要内容是介绍基于计算机视觉的光流计算及其技术发展路线,从经典算法和深度学习算法这两个主流技术路线出发,总结了技术发展过程中产生的重要理论、方法与模型,着重介绍了各类方法与模型的核心思想,说明了各类数据集及相关性能指标,同时简要介绍了光流计算技术的主要应用场景,并对今后的技术方向进行了展望。
  • 图  1  传统框架与PWC-Net框架图对比图(图片改绘自文献[10])

    图  2  RATF模型框架图(图片改绘自文献[13])

    图  3  GMFlow的框架图(图片改绘自文献[18])

    图  4  光流SLAM效果图(图片出自文献[43])

    表  1  光流估计监督模型汇总

    年份模型名称主要问题方向突出特点
    2015FlowNet[6]表明端到端的CNN可以做光流估计Encoder-decoder结构
    2016PatchBatch[15]利用CNN提取高维特征做描述符计算匹配然后用Epic Flow(经典算法)稠密插值使用STD结构
    用CNN提取高维描述符
    FlowNet2.0[9]相较FlowNet提高了光流精度对CNN结构组合进行探索
    2017DCFlow[7]提高了光流精度提出4D 代价体以及CTF结构范式
    2018PWC-Net[10]相较FlowNet2.0减少了模型参数
    提高了光流精度
    提高了光流实时性
    融合了特征金字塔
    使用了扭曲(warp)层
    以及标准CTF框架
    LiteFlowNet[11]相较FlowNet2.0减少了网络参数提出了对特征层
    使用扭曲(warp)
    结合残差网络减少参数数量
    提出了级联流推断结构
    2019IRR-PWC[12]相较PWC-Net减少参数数量,提高了精度提出了轻量化的迭代残差细化模块
    可集合遮挡预测模块提高预测精度
    2020RAFT[13]在CTF后处理阶段使用GRU结构进行处理,可以有效使用上下文信息进行更新
    保持模型参数情况下,有效提高了精度和处理速度
    使用GRU处理4D 代价体
    2021GMA[19]通过互相关技术构建了Global Motion Aggregation(GMA)模块
    优化了遮挡处理
    应用互相关技术
    提高了遮挡下模型的性能
    SCV[20]减少了4D 代价体的计算量
    提高了计算速度与精度
    通过KNN建立稀疏的4D 代价体
    通过多尺度位移编码器得到稠密的光流
    2022AGFlow[21]对场景整体运动的理解
    遮挡问题
    结合运动信息与上下文信息
    结合图神经网络自适应图推理模块进行光流估计
    GMFLowNet[22]优化了大位移问题
    优化了边缘
    在RAFT基础上融合了基于attention机制的匹配模块
    GMFlow[18]基于attention技术优化了代价体与卷积固有的局部相关性限制
    优化了大位移
    基于transformer架构
    采用全局匹配
    下载: 导出CSV

    表  2  光流估计自监督模型汇总

    年份模型名称主要问题方向突出特点
    2016MIND[23]自监督学习可以有效学习前后两帧之间的关联关系,并用于初始化光流计算自监督
    端到端
    Unsupervised
    FlowNet[24]
    得到与FlowNet近似的精度自监督模型借鉴传统能量优化中的光度损失和平滑损失
    训练自监督模型
    2017DFDFlow[26]提高了精度利用全卷积的DenseNet自监督巡训练
    端到端
    UnFlow[25]提高了精度交换前后两帧顺序
    对误差进行学习
    2018Unsupervised
    OAFlow[31]
    精度超越部分监督模型
    优化遮挡问题
    设计的对遮挡的mask层屏蔽错误匹配
    2019DDFlow[26]优化遮挡问题采用知识蒸馏,老师网络类似UnFlow
    学生网络增加针对遮挡的loss
    SelFlow[27]提高抗遮挡能力基于PWC-Net框架
    利用超像素分割随机遮挡增强数据
    UFlow[28]分析了之前自监督光流各个组件的性能以及优化组合
    提高了光流估计精度
    提出的改进方法是代价体归一化、用于遮挡估计的梯度停止、在本地流分辨率下应用平滑性,以及用于自我监督的图像大小调整
    2021UPflow[29]提高了精度
    优化了边缘细节
    用CNN改进了基于双线性插值的coarse to fine方法
    金字塔蒸馏损失
    SMURF[30]提高精度超过PWC-Net和FlowNet2
    RAFT自监督实现
    基于RAFT模型
    full-image warping预测真外运动
    多帧自我监督,以改善遮挡区域
    2022SEBFlow[32]基于事件摄像机的自监督方法采用新传感器
    下载: 导出CSV

    表  3  光流估计模型评估公开数据集

    名称年份特点评价指标
    KITTI2012[34]2012EPE,AEPE,fps,AE,AAE
    KITTI2015[35]2015真实驾驶场景测量数据集,其中 2012 测试图像集仅包含静止背景,2015 测试图像集则采用动态背景,显著增强了测试图像集的光流估计难度 异常光流占的百分比(Fl )
    背景区域异常光流占的百分比(bg)
    前景区域异常光流占的百分比(fg)
    异常光流占所有标签的百分比(all)
    MPI-Sintel[36]2012使用开源动画电影《Sintel》由开源软件动画制作软件blender渲染出的合成数据集,场景效果可控,光流值准确。数据集仅截取部分典型场景,分为Sintel Clean和Sintel Final两组,其中Clean 数据集包含大位移、弱纹理、非刚性,大形变等困难场景,以测试光流计算模型的准确性; Final数据集通过添加运动模糊、雾化效果以及图像噪声使其更加贴近于现实场景,用于测试光流估计的可靠性整幅图像上的端点误差(EPE,AEPE)
    相邻帧中可见区域的端点误差(EPE -matched)
    相邻帧中不可见区域的端点误差(EPE -unmatched)
    距离最近遮挡边缘在 0~10 像素的端点误差(d0-10 )
    速度在 0~10 像素的帧区域端点误差(s0-10)
    速度在 10~40 像素的帧区域端点误差(s10-40)
    FlyingChairs[6]2015合成椅子与随机背景的组合的场景,利用仿射变换模拟运动场景,FlowNet基于此合成数据集训练EPE,AEPE,fps,AE,AAE
    Scene Flow Datasets[38]2016包涵Driving、Monkaa、FlyingThings3D 3个数据集,其中FlyingThings3D组合不同物体与背景,模拟真实3维空间运动,以及动画Driving, Monkaa共同构成此数据集。此合成数据集除光流外还提供场景流EPE,AEPE,fps,AE,AAE
    HD1K[39]2016城市自动驾驶真实数据集,涵盖了以前没有的、具有挑战性的情况,如光线不足或下雨,并带有像素级的不确定性EPE,AEPE,fps,AE,AAE
    AutoFlow[40]2021基于学习的自动化数据集生成工具,由可学习的超参数控制数据集的生成,提供包括准确光流、数据增强、跨数据集泛化能力等,可以极大提高模型训练效果在训练时使用本数据集,测试时利用其他数据集进行测试,评价指标同测试数据集
    SHIFT[41]2022目前最大合成数据集包含了雾天、雨天、一天的时间、车辆人群的密集和离散,且具有全面的传感器套件和注释,可以提供连续域数据EPE,AEPE,fps,AE,AAE
    下载: 导出CSV

    表  4  部分模型在Sintel及KITTI数据集上的性能(截至2023年3月)

    模型名称MPI Sintel
    Clean(EPE-ALL)
    MPI Sintel
    Clean排名
    MPI Sintel
    Final(EPE-ALL)
    MPI Sintel
    Final排名
    KITTI 2012
    (Avg-All)
    KITTI
    排名
    GMFlow+1.02822.367181.01
    RATF1.609572.85561
    SMURF3.1521434.1831211.420
    PWC-Net4.3862565.0421841.739
    下载: 导出CSV
  • [1] HORN B and SCHUNCK B G. Determining optical flow[C]. SPIE 0281, Techniques and Applications of Image Understanding, Washington, USA, 1981.
    [2] LUCAS B D and KANADE T. An iterative image registration technique with an application to stereo vision[C]. The 7th International Joint Conference on Artificial Intelligence, Vancouver, Canada, 1981: 674–679.
    [3] BOUGUET J Y. Pyramidal implementation of the lucas kanade feature tracker. Intel Corporation, Microprocessor Research Labs, 1999.
    [4] BERTSEKAS D P. Constrained Optimization and Lagrange Multiplier Methods[M]. Boston: Academic Press, 1982: 724.
    [5] SONG Xiaojing. A Kalman filter-integrated optical flow method for velocity sensing of mobile robots[J]. Journal of Intelligent & Robotic Systems, 2014, 77(1): 13–26.
    [6] DOSOVITSKIY A, FISCHER P, ILG E, et al. FlowNet: Learning optical flow with convolutional networks[C]. 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 2015: 2758–2766.
    [7] XU Jia, RANFTL R, and KOLTUN V. Accurate optical flow via direct cost volume processing[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 2017: 5807–5815.
    [8] CHEN Qifeng and KOLTUN V. Full flow: Optical flow estimation by global optimization over regular grids[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016: 4706–4714.
    [9] ILG E, MAYER N, SAIKIA T, et al. FlowNet 2.0: Evolution of optical flow estimation with deep networks[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 2017: 1647–1655.
    [10] SUN Deqing, YANG Xiaodong, LIU Mingyu, et al. PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 8934–8943.
    [11] HUI T W, TANG Xiaoou, and LOY C C. LiteFlowNet: A lightweight convolutional neural network for optical flow estimation[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 8981–8989.
    [12] HUR J and ROTH S. Iterative residual refinement for joint optical flow and occlusion estimation[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, 2019: 5747–5756.
    [13] TEED Z and DENG Jia. RAFT: Recurrent all-pairs field transforms for optical flow[C]. 16th European Conference on Computer Vision, Glasgow, UK, 2020: 402–419.
    [14] REVAUD J, WEINZAEPFEL P, HARCHAOUI Z, et al. EpicFlow: Edge-preserving interpolation of correspondences for optical flow[C]. 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 1164–1172.
    [15] GADOT D and WOLF L. PatchBatch: A batch augmented loss for optical flow[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016: 4236–4245.
    [16] THEWLIS J, ZHENG Shuai, TORR P H S, et al. Fully-trainable deep matching[C]. Proceedings of the British Machine Vision Conference 2016, York, UK, 2016.
    [17] WANNENWETSCH A S and ROTH S. Probabilistic pixel-adaptive refinement networks[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2020: 11639–11648.
    [18] XU Haofei, ZHANG Jing, CAI Jianfei, et al. GMFlow: Learning optical flow via global matching[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, 2022: 8111–8120.
    [19] JIANG Shihao, CAMPBELL D, LU Yao, et al. Learning to estimate hidden motions with global motion aggregation[C]. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021: 9752–9761.
    [20] JIANG Shihao, LU Yao, LI Hongdong, et al. Learning optical flow from a few matches[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2021: 16587–16595.
    [21] LUO Ao, YANG Fan, LUO Kunming, et al. Learning optical flow with adaptive graph reasoning[C]. Thirty-Sixth AAAI Conference on Artificial Intelligence, Vancouver Canada, 2021.
    [22] ZHAO Shiyu, ZHAO Long, ZHANG Zhixing, et al. Global matching with overlapping attention for optical flow estimation[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, 2022: 17571–17580.
    [23] LONG Gucan, KNEIP L, ALVAREZ J M, et al. Learning image matching by simply watching video[C]. 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 2016: 434–450.
    [24] YU J J, HARLEY A W, and DERPANIS K G. Back to basics: Unsupervised learning of optical flow via brightness constancy and motion smoothness[C]. European Conference on Computer Vision, Amsterdam, The Netherlands, 2016: 3–10.
    [25] MEISTER S, HUR J, and ROTH S. UnFlow: Unsupervised learning of optical flow with a bidirectional census loss[C]. Thirty-Sixth AAAI Conference on Artificial Intelligence, New Orleans, USA, 2018.
    [26] LIU Pengpeng, KING I, LYU M R, et al. DDFlow: Learning optical flow with unlabeled data distillation[C]. Thirty-Sixth AAAI Conference on Artificial Intelligence, Honolulu, USA, 2019: 8770–8777.
    [27] LIU Pengpeng, LYU M, KING I, et al. SelFlow: Self-supervised learning of optical flow[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, 2019: 4566–4575.
    [28] JONSCHKOWSKI R, STONE A, BARRON J T, et al. What matters in unsupervised optical flow[C]. 16th European Conference on Computer Vision, Glasgow, UK, 2020: 557–572.
    [29] LUO Kunming, WANG Chuan, LIU Shuaicheng, et al. UPFlow: Upsampling pyramid for unsupervised optical flow learning[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2021: 1045–1054.
    [30] STONE A, MAURER D, AYVACI A, et al. SMURF: Self-teaching multi-frame unsupervised RAFT with full-image warping[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2021: 3886–3895.
    [31] WANG Yang, YANG Yi, YANG Zhenheng, et al. Occlusion aware unsupervised learning of optical flow[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 4884–4893.
    [32] SHIBA S, AOKI Y, and GALLEGO G. Secrets of event-based optical flow[C]. 17th European Conference on Computer Vision, Tel Aviv, Israel, 2022.
    [33] LAI Weisheng, HUANG Jiabin, and YANG M H. Semi-supervised learning for optical flow with generative adversarial networks[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2018: 892–896.
    [34] GEIGER A, LENZ P, and URTASUN R. Are we ready for autonomous driving? The KITTI vision benchmark suite[C]. 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, USA, 2012: 3354–3361.
    [35] KENNEDY R and TAYLOR C J. Optical flow with geometric occlusion estimation and fusion of multiple frames[C]. The 10th International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition, Hong Kong, China, 2015: 364–377.
    [36] BUTLER D J, WULFF J, STANLEY G B, et al. A naturalistic open source movie for optical flow evaluation[C]. The 12th European Conference on Computer Vision, Florence, Italy, 2012: 611–625.
    [37] MAYER N, ILG E, HÄUSSER P, et al. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation[C]. The 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016: 4040–4048.
    [38] SCHRÖDER G, SENST T, BOCHINSKI E, et al. Optical flow dataset and benchmark for visual crowd analysis[C]. 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Auckland, New Zealand, 2018: 1–6.
    [39] KONDERMANN D, NAIR R, HONAUER K, et al. The HCI benchmark suite: Stereo and flow ground truth with uncertainties for urban autonomous driving[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Las Vegas, USA, 2016: 19–28.
    [40] SUN Deqing, VLASIC D, HERRMANN C, et al. AutoFlow: Learning a better training set for optical flow[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2021: 10088–10097.
    [41] SUN Tao, SEGU M, POSTELS J, et al. SHIFT: A synthetic driving dataset for continuous multi-task domain adaptation[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, 2022: 21339–21350.
    [42] KALAL Z, MIKOLAJCZYK K, and MATAS J. Tracking-learning-detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(7): 1409–1422. doi: 10.1109/TPAMI.2011.239
    [43] YE Weicai, YU Xingyuan, LAN Xinyue, et al. DeFlowSLAM: Self-supervised scene motion decomposition for dynamic dense SLAM[EB/OL]. https://arxiv.org/abs/2207.08794, 2022.
    [44] SANES J R and MASLAND R H. The types of retinal ganglion cells: Current status and implications for neuronal classification[J]. Annual Review of Neuroscience, 2015, 38(1): 221–246. doi: 10.1146/annurev-neuro-071714-034120
    [45] GALLETTI C and FATTORI P. The dorsal visual stream revisited: Stable circuits or dynamic pathways?[J]. Cortex, 2018, 98: 203–217. doi: 10.1016/j.cortex.2017.01.009
    [46] WEI Wei. Neural mechanisms of motion processing in the mammalian retina[J]. Annual Review of Vision Science, 2018, 4: 165–192. doi: 10.1146/annurev-vision-091517-034048
    [47] ROSSI L F, HARRIS K D, and CARANDINI M. Spatial connectivity matches direction selectivity in visual cortex[J]. Nature, 2020, 588(7839): 648–652. doi: 10.1038/s41586-020-2894-4
  • 加载中
图(4) / 表(4)
计量
  • 文章访问数:  433
  • HTML全文浏览量:  526
  • PDF下载量:  175
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-11-10
  • 修回日期:  2023-04-11
  • 网络出版日期:  2023-04-24
  • 刊出日期:  2023-08-21

目录

    /

    返回文章
    返回