视觉光流计算技术及其应用

崔毅博; 汤仁东; 邢大军; 王隽; 李尚生

doi:10.11999/JEIT221418

视觉光流计算技术及其应用

doi: 10.11999/JEIT221418 cstr: 32379.14.JEIT221418

1.
海军航空大学烟台 264001
2.
军事医学研究院北京 100850
3.
北京师范大学认知神经科学与学习国家重点实验室北京 100875
4.
中国科学院自动化研究所北京 100190

详细信息

作者简介:
崔毅博：男，博士生，研究方向目标检测与跟踪、运动感知等

汤仁东：男，博士，副研究员，研究方向为非人灵长类视觉神经机制解析

邢大军：男，博士，研究员，研究方向为视觉认知与计算

王隽：女，博士，副研究员，研究方向为图像质量增强、视觉目标检测和强化学习

李尚生：男，硕士，教授，研究方向为雷达成像等

通讯作者:
李尚生　1164322995@qq.com

中图分类号: TN919.8; TP391.41
计量
- 文章访问数: 1318
- HTML全文浏览量: 1638
- PDF下载量: 243
- 被引次数: 0
出版历程
- 收稿日期: 2022-11-10
- 修回日期: 2023-04-11
- 网络出版日期: 2023-04-24
- 刊出日期: 2023-08-21

Visual Optical Flow Computing: Algorithms and Applications

1.
Naval Aviation University, Yantai 264001, China
2.
Academy of Military Medical Sciences, Beijing 100850, China
3.
State Key Laboratory of Cognitive Neuroscience and Learning, Beijing 100875, China
4.
Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

摘要

摘要: 视觉光流计算是计算机视觉从处理2维图像走向加工3维视频的重要技术手段，是描述视觉运动信息的主要方式。光流计算技术已经发展了较长的时间，随着相关技术尤其是深度学习技术在近些年的迅速发展，光流计算的性能得到了极大提升，但仍然有大量的局限性问题没有解决，准确、快速且稳健的光流计算目前仍然是一个有挑战性的研究领域和业内研究热点。光流计算作为一种低层视觉信息处理技术，其技术进展也将有助于相关中高层级视觉任务的实现。该文主要内容是介绍基于计算机视觉的光流计算及其技术发展路线，从经典算法和深度学习算法这两个主流技术路线出发，总结了技术发展过程中产生的重要理论、方法与模型，着重介绍了各类方法与模型的核心思想，说明了各类数据集及相关性能指标，同时简要介绍了光流计算技术的主要应用场景，并对今后的技术方向进行了展望。
- 计算机视觉 /
- 光流 /
- 深度学习 /
- 综述
Abstract: Visual optical flow calculation is an important technique for computer vision to move from processing 2D images to processing 3D videos, and is the main way of describing visual motion information. The optical flow calculation technique has been developed for a long time. With the rapid development of related technologies, especially deep learning technology in recent years, the performance of optical flow calculation has been greatly improved. However, there are still many limitations that have not been solved. Accurate, fast, and robust optical flow calculation is still a challenging research field and a hot topic in the industry. As a low-level visual information processing technology, the implementation of related high-level visual tasks will also be contributed by the technological advances of optical flow calculation. In this paper, the development path of optical flow calculation based on computer vision is mainly introduced. The important theories, methods, and models generated during the technological development process from the two mainstream technology paths of classical algorithms and deep learning algorithms are summarized, the core ideas of various methods and models are being introduced and the various datasets and performance indicators are explained, the main application scenarios of optical flow calculation technology are briefly introduced, and the future technical directions are also prospected.
- Computer vision /
- Optical flow /
- Deep learning /
- Overview

HTML全文

图 1 传统框架与PWC-Net框架图对比图(图片改绘自文献[10])

下载: 全尺寸图片幻灯片

图 2 RATF模型框架图(图片改绘自文献[13])

下载: 全尺寸图片幻灯片

图 3 GMFlow的框架图(图片改绘自文献[18])

下载: 全尺寸图片幻灯片

图 4 光流SLAM效果图(图片出自文献[43])

下载: 全尺寸图片幻灯片

表 1 光流估计监督模型汇总

年份	模型名称	主要问题方向	突出特点
2015	FlowNet^[6]	表明端到端的CNN可以做光流估计	Encoder-decoder结构
2016	PatchBatch^[15]	利用CNN提取高维特征做描述符计算匹配然后用Epic Flow(经典算法)稠密插值	使用STD结构用CNN提取高维描述符
2016	FlowNet2.0^[9]	相较FlowNet提高了光流精度	对CNN结构组合进行探索
2017	DCFlow^[7]	提高了光流精度	提出4D 代价体以及CTF结构范式
2018	PWC-Net^[10]	相较FlowNet2.0减少了模型参数提高了光流精度提高了光流实时性	融合了特征金字塔使用了扭曲(warp)层以及标准CTF框架
2018	LiteFlowNet^[11]	相较FlowNet2.0减少了网络参数	提出了对特征层使用扭曲(warp) 结合残差网络减少参数数量提出了级联流推断结构
2019	IRR-PWC^[12]	相较PWC-Net减少参数数量，提高了精度	提出了轻量化的迭代残差细化模块可集合遮挡预测模块提高预测精度
2020	RAFT^[13]	在CTF后处理阶段使用GRU结构进行处理，可以有效使用上下文信息进行更新保持模型参数情况下，有效提高了精度和处理速度	使用GRU处理4D 代价体
2021	GMA^[19]	通过互相关技术构建了Global Motion Aggregation(GMA)模块优化了遮挡处理	应用互相关技术提高了遮挡下模型的性能
2021	SCV^[20]	减少了4D 代价体的计算量提高了计算速度与精度	通过KNN建立稀疏的4D 代价体通过多尺度位移编码器得到稠密的光流
2022	AGFlow^[21]	对场景整体运动的理解遮挡问题	结合运动信息与上下文信息结合图神经网络自适应图推理模块进行光流估计
	GMFLowNet^[22]	优化了大位移问题优化了边缘	在RAFT基础上融合了基于attention机制的匹配模块
	GMFlow^[18]	基于attention技术优化了代价体与卷积固有的局部相关性限制优化了大位移	基于transformer架构采用全局匹配

下载: 导出CSV

表 2 光流估计自监督模型汇总

年份	模型名称	主要问题方向	突出特点
2016	MIND^[23]	自监督学习可以有效学习前后两帧之间的关联关系，并用于初始化光流计算	自监督端到端
2016	Unsupervised FlowNet^[24]	得到与FlowNet近似的精度自监督模型	借鉴传统能量优化中的光度损失和平滑损失训练自监督模型
2017	DFDFlow^[26]	提高了精度	利用全卷积的DenseNet自监督巡训练端到端
2017	UnFlow^[25]	提高了精度	交换前后两帧顺序对误差进行学习
2018	Unsupervised OAFlow^[31]	精度超越部分监督模型优化遮挡问题	设计的对遮挡的mask层屏蔽错误匹配
2019	DDFlow^[26]	优化遮挡问题	采用知识蒸馏，老师网络类似UnFlow 学生网络增加针对遮挡的loss
	SelFlow^[27]	提高抗遮挡能力	基于PWC-Net框架利用超像素分割随机遮挡增强数据
	UFlow^[28]	分析了之前自监督光流各个组件的性能以及优化组合提高了光流估计精度	提出的改进方法是代价体归一化、用于遮挡估计的梯度停止、在本地流分辨率下应用平滑性，以及用于自我监督的图像大小调整
2021	UPflow^[29]	提高了精度优化了边缘细节	用CNN改进了基于双线性插值的coarse to fine方法金字塔蒸馏损失
2021	SMURF^[30]	提高精度超过PWC-Net和FlowNet2 RAFT自监督实现	基于RAFT模型 full-image warping预测真外运动多帧自我监督，以改善遮挡区域
2022	SEBFlow^[32]	基于事件摄像机的自监督方法	采用新传感器

下载: 导出CSV

表 3 光流估计模型评估公开数据集

名称	年份	特点	评价指标
KITTI2012^[34]	2012		EPE,AEPE,fps,AE,AAE
KITTI2015^[35]	2015	真实驾驶场景测量数据集，其中 2012 测试图像集仅包含静止背景，2015 测试图像集则采用动态背景，显著增强了测试图像集的光流估计难度	异常光流占的百分比(Fl ) 背景区域异常光流占的百分比(bg) 前景区域异常光流占的百分比(fg) 异常光流占所有标签的百分比(all)
MPI-Sintel^[36]	2012	使用开源动画电影《Sintel》由开源软件动画制作软件blender渲染出的合成数据集，场景效果可控，光流值准确。数据集仅截取部分典型场景，分为Sintel Clean和Sintel Final两组，其中Clean 数据集包含大位移、弱纹理、非刚性，大形变等困难场景，以测试光流计算模型的准确性； Final数据集通过添加运动模糊、雾化效果以及图像噪声使其更加贴近于现实场景，用于测试光流估计的可靠性	整幅图像上的端点误差(EPE,AEPE) 相邻帧中可见区域的端点误差(EPE -matched) 相邻帧中不可见区域的端点误差(EPE -unmatched) 距离最近遮挡边缘在 0～10 像素的端点误差(d0-10 ) 速度在 0～10 像素的帧区域端点误差(s0-10) 速度在 10～40 像素的帧区域端点误差(s10-40)
FlyingChairs^[6]	2015	合成椅子与随机背景的组合的场景，利用仿射变换模拟运动场景，FlowNet基于此合成数据集训练	EPE,AEPE,fps,AE,AAE
Scene Flow Datasets^[38]	2016	包涵Driving、Monkaa、FlyingThings3D 3个数据集，其中FlyingThings3D组合不同物体与背景，模拟真实3维空间运动，以及动画Driving, Monkaa共同构成此数据集。此合成数据集除光流外还提供场景流	EPE,AEPE,fps,AE,AAE
HD1K^[39]	2016	城市自动驾驶真实数据集，涵盖了以前没有的、具有挑战性的情况，如光线不足或下雨，并带有像素级的不确定性	EPE,AEPE,fps,AE,AAE
AutoFlow^[40]	2021	基于学习的自动化数据集生成工具，由可学习的超参数控制数据集的生成，提供包括准确光流、数据增强、跨数据集泛化能力等，可以极大提高模型训练效果	在训练时使用本数据集，测试时利用其他数据集进行测试，评价指标同测试数据集
SHIFT^[41]	2022	目前最大合成数据集包含了雾天、雨天、一天的时间、车辆人群的密集和离散，且具有全面的传感器套件和注释，可以提供连续域数据	EPE,AEPE,fps,AE,AAE

下载: 导出CSV

表 4 部分模型在Sintel及KITTI数据集上的性能(截至2023年3月)

模型名称	MPI Sintel Clean(EPE-ALL)	MPI Sintel Clean排名	MPI Sintel Final(EPE-ALL)	MPI Sintel Final排名	KITTI 2012 (Avg-All)	KITTI 排名
GMFlow+	1.028	2	2.367	18	1.0	1
RATF	1.609	57	2.855	61	–	–
SMURF	3.152	143	4.183	121	1.4	20
PWC-Net	4.386	256	5.042	184	1.7	39

下载: 导出CSV

参考文献(47)

[1]	HORN B and SCHUNCK B G. Determining optical flow[C]. SPIE 0281, Techniques and Applications of Image Understanding, Washington, USA, 1981.
[2]	LUCAS B D and KANADE T. An iterative image registration technique with an application to stereo vision[C]. The 7th International Joint Conference on Artificial Intelligence, Vancouver, Canada, 1981: 674–679.
[3]	BOUGUET J Y. Pyramidal implementation of the lucas kanade feature tracker. Intel Corporation, Microprocessor Research Labs, 1999.
[4]	BERTSEKAS D P. Constrained Optimization and Lagrange Multiplier Methods[M]. Boston: Academic Press, 1982: 724.
[5]	SONG Xiaojing. A Kalman filter-integrated optical flow method for velocity sensing of mobile robots[J]. Journal of Intelligent & Robotic Systems, 2014, 77(1): 13–26.
[6]	DOSOVITSKIY A, FISCHER P, ILG E, et al. FlowNet: Learning optical flow with convolutional networks[C]. 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 2015: 2758–2766.
[7]	XU Jia, RANFTL R, and KOLTUN V. Accurate optical flow via direct cost volume processing[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 2017: 5807–5815.
[8]	CHEN Qifeng and KOLTUN V. Full flow: Optical flow estimation by global optimization over regular grids[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016: 4706–4714.
[9]	ILG E, MAYER N, SAIKIA T, et al. FlowNet 2.0: Evolution of optical flow estimation with deep networks[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 2017: 1647–1655.
[10]	SUN Deqing, YANG Xiaodong, LIU Mingyu, et al. PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 8934–8943.
[11]	HUI T W, TANG Xiaoou, and LOY C C. LiteFlowNet: A lightweight convolutional neural network for optical flow estimation[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 8981–8989.
[12]	HUR J and ROTH S. Iterative residual refinement for joint optical flow and occlusion estimation[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, 2019: 5747–5756.
[13]	TEED Z and DENG Jia. RAFT: Recurrent all-pairs field transforms for optical flow[C]. 16th European Conference on Computer Vision, Glasgow, UK, 2020: 402–419.
[14]	REVAUD J, WEINZAEPFEL P, HARCHAOUI Z, et al. EpicFlow: Edge-preserving interpolation of correspondences for optical flow[C]. 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 1164–1172.
[15]	GADOT D and WOLF L. PatchBatch: A batch augmented loss for optical flow[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016: 4236–4245.
[16]	THEWLIS J, ZHENG Shuai, TORR P H S, et al. Fully-trainable deep matching[C]. Proceedings of the British Machine Vision Conference 2016, York, UK, 2016.
[17]	WANNENWETSCH A S and ROTH S. Probabilistic pixel-adaptive refinement networks[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2020: 11639–11648.
[18]	XU Haofei, ZHANG Jing, CAI Jianfei, et al. GMFlow: Learning optical flow via global matching[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, 2022: 8111–8120.
[19]	JIANG Shihao, CAMPBELL D, LU Yao, et al. Learning to estimate hidden motions with global motion aggregation[C]. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021: 9752–9761.
[20]	JIANG Shihao, LU Yao, LI Hongdong, et al. Learning optical flow from a few matches[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2021: 16587–16595.
[21]	LUO Ao, YANG Fan, LUO Kunming, et al. Learning optical flow with adaptive graph reasoning[C]. Thirty-Sixth AAAI Conference on Artificial Intelligence, Vancouver Canada, 2021.
[22]	ZHAO Shiyu, ZHAO Long, ZHANG Zhixing, et al. Global matching with overlapping attention for optical flow estimation[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, 2022: 17571–17580.
[23]	LONG Gucan, KNEIP L, ALVAREZ J M, et al. Learning image matching by simply watching video[C]. 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 2016: 434–450.
[24]	YU J J, HARLEY A W, and DERPANIS K G. Back to basics: Unsupervised learning of optical flow via brightness constancy and motion smoothness[C]. European Conference on Computer Vision, Amsterdam, The Netherlands, 2016: 3–10.
[25]	MEISTER S, HUR J, and ROTH S. UnFlow: Unsupervised learning of optical flow with a bidirectional census loss[C]. Thirty-Sixth AAAI Conference on Artificial Intelligence, New Orleans, USA, 2018.
[26]	LIU Pengpeng, KING I, LYU M R, et al. DDFlow: Learning optical flow with unlabeled data distillation[C]. Thirty-Sixth AAAI Conference on Artificial Intelligence, Honolulu, USA, 2019: 8770–8777.
[27]	LIU Pengpeng, LYU M, KING I, et al. SelFlow: Self-supervised learning of optical flow[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, 2019: 4566–4575.
[28]	JONSCHKOWSKI R, STONE A, BARRON J T, et al. What matters in unsupervised optical flow[C]. 16th European Conference on Computer Vision, Glasgow, UK, 2020: 557–572.
[29]	LUO Kunming, WANG Chuan, LIU Shuaicheng, et al. UPFlow: Upsampling pyramid for unsupervised optical flow learning[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2021: 1045–1054.
[30]	STONE A, MAURER D, AYVACI A, et al. SMURF: Self-teaching multi-frame unsupervised RAFT with full-image warping[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2021: 3886–3895.
[31]	WANG Yang, YANG Yi, YANG Zhenheng, et al. Occlusion aware unsupervised learning of optical flow[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 4884–4893.
[32]	SHIBA S, AOKI Y, and GALLEGO G. Secrets of event-based optical flow[C]. 17th European Conference on Computer Vision, Tel Aviv, Israel, 2022.
[33]	LAI Weisheng, HUANG Jiabin, and YANG M H. Semi-supervised learning for optical flow with generative adversarial networks[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2018: 892–896.
[34]	GEIGER A, LENZ P, and URTASUN R. Are we ready for autonomous driving? The KITTI vision benchmark suite[C]. 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, USA, 2012: 3354–3361.
[35]	KENNEDY R and TAYLOR C J. Optical flow with geometric occlusion estimation and fusion of multiple frames[C]. The 10th International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition, Hong Kong, China, 2015: 364–377.
[36]	BUTLER D J, WULFF J, STANLEY G B, et al. A naturalistic open source movie for optical flow evaluation[C]. The 12th European Conference on Computer Vision, Florence, Italy, 2012: 611–625.
[37]	MAYER N, ILG E, HÄUSSER P, et al. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation[C]. The 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016: 4040–4048.
[38]	SCHRÖDER G, SENST T, BOCHINSKI E, et al. Optical flow dataset and benchmark for visual crowd analysis[C]. 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Auckland, New Zealand, 2018: 1–6.
[39]	KONDERMANN D, NAIR R, HONAUER K, et al. The HCI benchmark suite: Stereo and flow ground truth with uncertainties for urban autonomous driving[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Las Vegas, USA, 2016: 19–28.
[40]	SUN Deqing, VLASIC D, HERRMANN C, et al. AutoFlow: Learning a better training set for optical flow[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2021: 10088–10097.
[41]	SUN Tao, SEGU M, POSTELS J, et al. SHIFT: A synthetic driving dataset for continuous multi-task domain adaptation[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, 2022: 21339–21350.
[42]	KALAL Z, MIKOLAJCZYK K, and MATAS J. Tracking-learning-detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(7): 1409–1422. doi: 10.1109/TPAMI.2011.239
[43]	YE Weicai, YU Xingyuan, LAN Xinyue, et al. DeFlowSLAM: Self-supervised scene motion decomposition for dynamic dense SLAM[EB/OL]. https://arxiv.org/abs/2207.08794, 2022.
[44]	SANES J R and MASLAND R H. The types of retinal ganglion cells: Current status and implications for neuronal classification[J]. Annual Review of Neuroscience, 2015, 38(1): 221–246. doi: 10.1146/annurev-neuro-071714-034120
[45]	GALLETTI C and FATTORI P. The dorsal visual stream revisited: Stable circuits or dynamic pathways?[J]. Cortex, 2018, 98: 203–217. doi: 10.1016/j.cortex.2017.01.009
[46]	WEI Wei. Neural mechanisms of motion processing in the mammalian retina[J]. Annual Review of Vision Science, 2018, 4: 165–192. doi: 10.1146/annurev-vision-091517-034048
[47]	ROSSI L F, HARRIS K D, and CARANDINI M. Spatial connectivity matches direction selectivity in visual cortex[J]. Nature, 2020, 588(7839): 648–652. doi: 10.1038/s41586-020-2894-4