高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于双向门控循环单元的3D人体运动预测

桑海峰 陈紫珍

桑海峰, 陈紫珍. 基于双向门控循环单元的3D人体运动预测[J]. 电子与信息学报, 2019, 41(9): 2256-2263. doi: 10.11999/JEIT180978
引用本文: 桑海峰, 陈紫珍. 基于双向门控循环单元的3D人体运动预测[J]. 电子与信息学报, 2019, 41(9): 2256-2263. doi: 10.11999/JEIT180978
Haifeng SANG, Zizhen CHEN. 3D Human Motion Prediction Based on Bi-directionalGated Recurrent Unit[J]. Journal of Electronics & Information Technology, 2019, 41(9): 2256-2263. doi: 10.11999/JEIT180978
Citation: Haifeng SANG, Zizhen CHEN. 3D Human Motion Prediction Based on Bi-directionalGated Recurrent Unit[J]. Journal of Electronics & Information Technology, 2019, 41(9): 2256-2263. doi: 10.11999/JEIT180978

基于双向门控循环单元的3D人体运动预测

doi: 10.11999/JEIT180978
基金项目: 国家自然科学基金(61773105),辽宁省自然科学基金(20170540675),辽宁省教育厅科研项目(LQGD2017023)
详细信息
    作者简介:

    桑海峰:男,1978年生,教授,博士,研究方向为视觉检测技术与图像处理,人工智能

    陈紫珍:女,1994年生,硕士生,研究方向为计算机视觉与图像处理,人工智能

    通讯作者:

    陈紫珍 chenziz@126.com

  • 中图分类号: TP181

3D Human Motion Prediction Based on Bi-directionalGated Recurrent Unit

Funds: The National Natural Science Foundation of China (61773105), The Natural Science Foundation of Liaoning Province (20170540675), The Research Project of Liaoning Provincial Department of Education (LQGD2017023)
  • 摘要: 在机器视觉领域,预测人体运动对于及时的人机交互及人员跟踪等是非常有必要的。为了改善人机交互及人员跟踪等的性能,该文提出一种基于双向门控循环单元(GRU)的编-解码器模型(EBiGRU-D)来学习3D人体运动并给出一段时间内的运动预测。EBiGRU-D是一种深递归神经网络(RNN),其中编码器是一个双向GRU (BiGRU)单元,解码器是一个单向GRU单元。BiGRU使原始数据从正反两个方向同时输入并进行编码,编成一个状态向量然后送入解码器进行解码。BiGRU将当前的输出与前后时刻的状态关联起来,使输出充分考虑了前后时刻的特征,从而使预测更加准确。在human3.6m数据集上的实验表明EBiGRU-D不仅极大地改善了3D人体运动预测的误差还大大地增加了准确预测的时间。
  • 图  1  EBiGRU-D网络结构

    图  2  GRU内部结构图

    图  3  BiGRU部分结构图

    图  4  1 s内关于walking动作预测性能对比

    图  5  1 s内关于discussion动作的预测性能的对比

    图  6  2 s内关于walking动作预测性能的对比

    图  7  2 s内关于复杂动作的EBiGRU-D网络 和Res-GRU网络性能的对比

    图  8  训练时间对比

    表  1  human3.6m数据集下1 s内各模型预测误差的对比(ms)

    预测时间(ms)801603204005606407201000
    Walking
    ERD[10]0.770.901.121.251.441.451.461.49
    LSTM-3LR[10]0.730.811.051.181.341.361.371.36
    Res-GRU[13]0.390.680.991.151.351.371.371.32
    MHU[14]0.320.530.690.770.900.940.971.06
    EBiGRU-D0.310.310.330.350.350.360.360.37
    Greeting
    ERD[10]0.851.091.451.641.931.891.921.98
    LSTM-3LR[10]0.800.991.371.541.811.761.791.85
    Res-GRU[13]0.520.861.301.471.781.751.821.96
    MHU[14]0.540.871.271.451.751.711.741.87
    EBiGRU-D0.480.440.490.490.520.510.520.49
    Walkingdog
    ERD[10]0.911.071.391.531.811.851.902.03
    LSTM-3LR[10]0.800.991.371.541.811.761.792.00
    Res-GRU[13]0.560.951.331.481.781.811.881.96
    MHU[14]0.560.881.211.371.671.721.811.90
    EBiGRU-D0.510.640.610.620.620.590.610.60
    Discussion
    ERD[10]0.760.961.171.241.571.701.842.04
    LSTM-3LR[10]0.710.841.021.111.491.621.761.99
    Res-GRU[13]0.310.691.031.121.521.611.701.87
    MHU[14]0.310.670.931.001.371.561.661.88
    EBiGRU-D0.330.440.500.450.481.510.500.49
    下载: 导出CSV

    表  2  human3.6m数据集下2 s内EBiGRU-D网络和Res-GRU网络预测误差的对比(ms)

    预测时间(ms)80320560720100010801320156017202000
    Walking
    Res-GRU[13]0.420.891.021.161.371.391.461.591.651.89
    EBiGRU-D0.360.350.360.390.410.410.440.470.480.48
    Greeting
    Res-GRU[13]0.650.891.211.351.561.771.852.022.162.22
    EBiGRU-D0.450.460.500.510.550.540.560.560.550.56
    Walkingdog
    Res-GRU[13]0.661.201.731.952.202.272.342.412.512.52
    EBiGRU-D0.490.580.600.600.610.600.590.600.610.61
    Discussion
    Res-GRU[13]0.891.231.561.691.852.012.122.322.492.56
    EBiGRU-D0.420.430.430.450.490.500.550.550.540.56
    下载: 导出CSV
  • FOKA A F and TRAHANIAS P E. Probabilistic autonomous robot navigation in dynamic environments with human motion prediction[J]. International Journal of Social Robotics, 2010, 2(1): 79–94. doi: 10.1007/s12369-009-0037-z
    MAINPRICE J and BERENSON D. Human–robot collaborative manipulation planning using early prediction of human motion[C]. 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan, 2013: 299–306.
    BÜTEPAGE J, BLACK M J, KRAGIC D, et al. Deep representation learning for human motion prediction and classification[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 1591–1599.
    TEKIN B, MÁRQUEZ–NEILA P, SALZMANN M, et al. Learning to fuse 2D and 3D image cues for monocular body pose estimation[C]. 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 3961–3970.
    YASIN H, IQBAL U, KRÜGER B, et al. A dual–source approach for 3D pose estimation from a single image[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 4948–4956.
    肖俊, 庄越挺, 吴飞. 三维人体运动特征可视化与交互式运动分割[J]. 软件学报, 2008, 19(8): 1995–2003.

    XIAO Jun, ZHUANG Yueting, and WU Fei. Feature visualization and interactive segmentation of 3D human motion[J]. Journal of Software, 2008, 19(8): 1995–2003.
    潘红, 肖俊, 吴飞, 等. 基于关键帧的三维人体运动检索[J]. 计算机辅助设计与图形学学报, 2009, 21(2): 214–222.

    PAN Hong, XIAO Jun, WU Fei, et al. 3D human motion retrieval based on key-frames[J]. Journal of Computer-Aided Design &Computer Graphics, 2009, 21(2): 214–222.
    LI Rui, LIU Zhenyu, and TAN Jianrong. Human motion segmentation using collaborative representations of 3D skeletal sequences[J]. IET Computer Vision, 2018, 12(4): 434–442. doi: 10.1049/iet-cvi.2016.0385
    TAYLOR G W, HINTON G E, and ROWEIS S. Modeling human motion using binary latent variables[C]. The 19th International Conference on Neural Information Processing Systems, Hong Kong, China, 2006: 1345–1352.
    FRAGKIADAKI K, LEVINE S, FELSEN P, et al. Recurrent network models for human dynamics[C]. The IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 4346–4354.
    HOLDEN D, SAITO J, and KOMURA T. A deep learning framework for character motion synthesis and editing[J]. ACM Transactions on Graphics, 2016, 35(4): 1–11.
    ASHESH J, ZAMIR A R, SAVARESE S, et al. Structural-RNN: Deep learning on spatio–temporal graphs[C]. IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016: 5308–5317.
    MARTINEZ J, BLACK M J, and ROMERO J. On human motion prediction using recurrent neural networks[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 4674–4683.
    TANG Yongyi, MA Lin, LIU Wei, et al. Long–term human motion prediction by modeling motion context and enhancing motion dynamic[J/OL]. arXiv: 1805.02513. http://arxiv.org/abs/1805.02513, 2018.
    ZHANG Yachao, LIU Kaipei, QIN Liang, et al. Deterministic and probabilistic interval prediction for short&-term wind power generation based on variational mode decomposition and machine learning methods[J]. Energy Conversion and Management, 2016, 112: 208–219. doi: 10.1016/j.enconman.2016.01.023
    CHO K, VAN MERRIENBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J/OL]. arXiv: 1406.1078, 2014.
  • 加载中
图(8) / 表(2)
计量
  • 文章访问数:  2718
  • HTML全文浏览量:  1723
  • PDF下载量:  100
  • 被引次数: 0
出版历程
  • 收稿日期:  2018-10-19
  • 修回日期:  2019-03-08
  • 网络出版日期:  2019-04-09
  • 刊出日期:  2019-09-10

目录

    /

    返回文章
    返回