行人轨迹预测条件端点局部目的地池化网络

毛琳; 解云娇; 杨大伟; 张汝波

doi:10.11999/JEIT210716

行人轨迹预测条件端点局部目的地池化网络

doi: 10.11999/JEIT210716

大连民族大学机电工程学院大连 116600

基金项目: 国家自然科学基金(61673084)，辽宁省自然科学基金(20180550866)

详细信息

作者简介:
毛琳：女，副教授，研究方向为目标跟踪与多传感器信息融合

解云娇：女，硕士生，研究方向为目标跟踪与轨迹预测

杨大伟：男，副教授，研究方向为计算机视觉处理技术

张汝波：男，教授，研究方向为智能机器人技术及智能信息处理技术

通讯作者:
解云娇 xiexiaohaiisbest@163.com

中图分类号: TP391.4
计量
- 文章访问数: 692
- HTML全文浏览量: 627
- PDF下载量: 91
- 被引次数: 0
出版历程
- 收稿日期: 2021-07-15
- 修回日期: 2021-12-17
- 录用日期: 2021-12-29
- 网络出版日期: 2022-01-13
- 刊出日期: 2022-10-19

Local Destination Pooling Network for Pedestrian Trajectory Prediction of Condition Endpoint

College of Mechanical and Electronic Engineering, Dalian Minzu University, Dalian 116600, China

Funds: The National Natural Science Foundation of China (61673084), The Natural Science Foundation of Liaoning Province (20180550866)

摘要

摘要: 轨迹预测是自动驾驶系统中的核心任务之一。现阶段基于深度学习的轨迹预测算法，涉及目标的信息表示、环境感知和运动推理。针对现有轨迹预测模型在运动推理过程中对行人社交动机考虑不足，无法有效预知场景中行人在不同社交条件下局部目的地的问题，该文提出一种条件端点局部目的地池化网络(CEPNET)。该网络通过条件变分自编码器推理潜在轨迹分布空间来学习历史观测轨迹在特定场景中的概率分布，构建条件端点局部特征推理算法，将条件端点作为局部目的地特征进行相似性特征编码，利用社交池化网络过滤掉场景中的干扰信号，融入自注意力社交掩码来增强行人的自我注意力。为验证算法各模块的可靠性，使用公开的行人鸟瞰数据集(BIWI)和塞浦路斯大学多人轨迹数据集(UCY)对CEPNET进行消融实验，并与平凡长短时记忆网络(Vanilla)、社交池化生成对抗网络(SGAN)和图注意力生成对抗网络(S-BiGAT)等先进轨迹预测算法进行对比分析。在Trajnet++基准上的实验结果表明，CEPNET算法性能优于现有先进算法，并且与基准算法Vanilla相比，平均位移误差(ADE)降低22.52%，最终位移误差(FDE)降低20%，预测碰撞率Col-I降低9.75%，真值碰撞率Col-II降低9.15%。
- 轨迹预测 /
- 条件概率分布 /
- 目的地池化 /
- 自注意力
Abstract: Trajectory prediction is one of the core tasks in automatic driving system. At present, trajectory prediction algorithms based on deep learning involve information representation, perception and motion reasoning of targets. Considering the problem that the existing trajectory prediction models does not take into account the social motivation of pedestrians and can not effectively predict the local destination of pedestrians in different social conditions in the scene, a Conditional Endpoint local destination Pooling NETwork (CEPNET) is proposed. The network uses conditional variational autoencoder to map out the potential distribution in space, which can study the observation of the history track probability distribution in the specific scene. And then a local feature inference algorithm is built to code the similarity features of conditional endpoint as local destination features. Finally, the interference signals in the scene are filtered out by social pooling network. At the same time, self-attention social mask is used to enhance pedestrian’s self-attention. In order to verify the reliability of each module of the algorithm, the public datasets of Walking pedestrians In busy scenarios from a BIrd eye view(BIWI) and University of CYprus multi-person trajectory (UCY) are used to conduct ablation experiments, and compared with advanced trajectory prediction algorithms such as Vanilla, Socially acceptable trajectories with Generative Adversarial Networks (SGAN) and multimodal Trajectory forecasting using Bicycle-GAN and Graph Attention networks(S-BiGAT). The experimental results on the Trajnet++ benchmark show that compared with the benchmark Vanilla algorithm, the Average Displacement Error (ADE) is reduced by 22.52%, the Final Displacement Error (FDE) is reduced by 20%, the predicted collision rate Col-I is reduced by 9.75%, and the true collision rate Col-II is reduced by 9.15%.
- Trajectory prediction /
- Conditional probability distribution /
- Destination pooling /
- Self-attention

HTML全文

图 1 CEPNET逻辑框架

下载: 全尺寸图片幻灯片

图 2 条件变分自编码器

下载: 全尺寸图片幻灯片

图 3 条件端点局部特征推理算法框架

下载: 全尺寸图片幻灯片

图 4 自注意力社交池化网络

下载: 全尺寸图片幻灯片

图 5 自注意力社交关系掩码

下载: 全尺寸图片幻灯片

图 6 条件端点局部目的地池化网络整体框架

下载: 全尺寸图片幻灯片

图 7 各个算法训练框架示意图

下载: 全尺寸图片幻灯片

图 8 不同模型训练和验证损失折线图

下载: 全尺寸图片幻灯片

图 9 不同场景中5种算法轨迹路径预测值可视化

下载: 全尺寸图片幻灯片

图 10 不同交互类型轨迹路径预测值可视化

下载: 全尺寸图片幻灯片

表 1 模型迭代学习率配置

迭代次数	1～9	10～18	19～25
学习率	10^-3	10^-4	10^-5

下载: 导出CSV

表 2 各算法模型迭代平均运行时间(min)

	SGAN	S-BiGAT	Vanilla	CENET-I(本文)	CEPNET(本文)
平均运行时间	38.38	151.68	2.97	66.18	59.09

下载: 导出CSV

表 3 CEPTNET与其他算法在ETH和UCY数据集上的定量结果

数据集	SGAN			S-BiGAT			Vanilla(Baseline)			CENET-I(本文)			CEPNET(本文)
数据集	ADE/FDE	Col-I	Col-II	ADE/FDE	Col-I	Col-II	ADE/FDE	Col-I	Col-II	ADE/FDE	Col-I	Col-II	ADE/FDE	Col-I	Col-II
ETH	0.66/1.30	7.40	8.98	0.96/1.79	7.31	11.62	0.99/1.89	12.06	12.59	1.22/2.47	9.77	10.92	0.66/1.34	9.42	8.45
Hotel	0.44/0.84	5.66	5.66	0.84/1.52	3.77	5.66	0.85/1.60	7.55	1.89	0.85/1.61	3.77	3.77	0.51/1.02	5.66	3.77
Univ	0.69/1.50	5.33	5.33	0.61/1.36	2.46	4.51	0.63/1.44	2.05	2.87	0.69/1.64	2.46	2.87	0.60/1.39	2.46	2.05
Zara1	0.43/0.90	4.20	8.39	0.46/0.98	0.7	11.89	0.42/0.98	8.39	8.39	0.41/0.88	7.69	9.79	0.39/0.84	6.99	7.69
Zara2	0.53/1.16	14.25	14.25	0.50/1.12	8.39	16.68	0.48/1.10	14.56	15.25	0.47/1.05	15.41	15.2	0.45/1.02	14.67	15.30
均值	0.55/1.14	7.37	8.52	0.67/1.36	4.53	10.07	0.68/1.40	8.92	8.20	0.72/1.49	7.82	9.11	0.52/1.12	8.05	7.45
注：Datasets属性下的粗体为未参与训练的测试集名称；红色为最低误差值，蓝色为第2低误差值。

下载: 导出CSV

表 4 不同场景下4种交互类别的预测值评估结果

类型	模型	场景序号	ADE (m)	FDE (m)	Col-I (%)	Col-II(%)
I	SGAN	102	0.20	0.41	11.76	6.68
I	S-BiGAT	102	0.22	0.47	8.82	10.78
I	Vallina(Baseline)	102	0.21	0.46	16.67	9.80
I	CENET-I(本文)	102	0.22	0.50	12.75	12.75
I	CEPNET(本文)	102	0.13(↓38.1%)	0.30(↓34.8%)	6.86(↓59.9%)	6.68(↓31.8%)
II	SGAN	779	0.40	0.80	11.81	11.42
II	S-BiGAT	779	0.46	0.91	7.75	11.68
II	Vallina(Baseline)	779	0.46	0.91	11.8	13.22
II	CENET-I(本文)	779	0.53	1.11	12.07	10.14
II	CEPNET(本文)	779	0.32(↓30.4%)	0.69(↓24.2%)	11.17(↓5.6%)	9.50(↓28.1%)
III	SGAN	1734	0.61	1.28	14.24	13.67
III	S-BiGAT	1734	0.72	1.49	9.63	15.63
III	Vallina(Baseline)	1734	0.74	1.54	15.92	16.03
III	CENET-I(本文)	1734	0.83	1.77	15.51	16.03
III	CEPNET(本文)	1734	0.61(↓17.6%)	1.31(↓14.9%)	15.4(↓3.27%)	15.5(↓3.30%)
IV	SGAN	660	0.71	1.50	4.85	5.91
IV	S-BiGAT	660	0.86	1.78	3.18	6.36
IV	Vallina(Baseline)	660	0.82	1.74	5.76	7.27
IV	CENET-I(本文)	660	0.84	1.79	5.00	7.42
IV	CEPNET(本文)	660	0.66(19.5%)	1.44(↓17.2%)	3.48(↓39.6%)	6.36(↓12.5%)
注：红色为最低误差值，蓝色为第2低误差值。

下载: 导出CSV

参考文献(21)

[1]	CHEN Changan, LIU Yuejiang, KREISS S, et al. Crowd-robot interaction: Crowd-aware robot navigation with attention-based deep reinforcement learning[C]. 2019 International Conference on Robotics and Automation (ICRA), Montreal, Canada, 2019: 6015–6022.
[2]	RASOULI A and TSOTSOS J K. Autonomous vehicles that interact with pedestrians: A survey of theory and practice[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 21(3): 900–918. doi: 10.1109/TITS.2019.2901817
[3]	BITGOOD S. An analysis of visitor circulation: Movement patterns and the general value principle[J]. Curator:The Museum Journal, 2006, 49(4): 463–475. doi: 10.1111/j.2151-6952.2006.tb00237.x
[4]	HORNI A, NAGEL K, and AXHAUSEN K W. The Multi-Agent Transport Simulation MATSim[M]. London: Ubiquity Press, 2016: 355–361.
[5]	DONG Hairong, ZHOU Min, WANG Qianling, et al. State-of-the-art pedestrian and evacuation dynamics[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 21(5): 1849–1866. doi: 10.1109/TITS.2019.2915014
[6]	ALAHI A, GOEL K, RAMANATHAN V, et al. Social LSTM: Human trajectory prediction in crowded spaces[C]. 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, USA, 2016: 961–971.
[7]	BISAGNO N, ZHANG Bo, and CONCI N. Group LSTM: Group trajectory prediction in crowded scenarios[C]. European Conference on Computer Vision, Munich, Germany, 2018: 213–225.
[8]	PELLEGRINI S, ESS A, SCHINDLER K, et al. You'll never walk alone: Modeling social behavior for multi-target tracking[C]. 2009 IEEE 12th International Conference on Computer Vision (ICCV), Kyoto, Japan, 2009: 261–268.
[9]	LERNER A, CHRYSANTHOU Y, and LISCHINSKI D. Crowds by example[J]. Computer Graphics Forum, 2007, 26(3): 655–664. doi: 10.1111/j.1467-8659.2007.01089.x
[10]	XUE Hao, HUYNH D Q, and REYNOLDS M. SS-LSTM: A hierarchical LSTM model for pedestrian trajectory prediction[C]. 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, USA, 2018: 1186–1194.
[11]	CHEUNG E, WONG T K, BERA A, et al. LCrowdV: Generating labeled videos for simulation-based crowd behavior learning[C]. European Conference on Computer Vision, Amsterdam, Netherlands, 2016: 709–727.
[12]	BARTOLI F, LISANTI G, BALLAN L, et al. Context-aware trajectory prediction[C]. 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 2018: 1941–1946.
[13]	GUPTA A, JOHNSON J, LI Feifei, et al. Social GAN: Socially acceptable trajectories with generative adversarial networks[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, 2018: 2255–2264.
[14]	FERNANDO T, DENMAN S, SRIDHARAN S, et al. GD-GAN: Generative adversarial networks for trajectory prediction and group detection in crowds[C]. Asian Conference on Computer Vision, Perth, Australia, 2018: 314–330.
[15]	VAN DER MAATEN L. Accelerating t-SNE using tree-based algorithms[J]. The Journal of Machine Learning Research, 2014, 15(1): 3221–3245.
[16]	KOSARAJU V, SADEGHIAN A, MARTÍN-MARTÍN R, et al. Social-BiGAT: Multimodal trajectory forecasting using bicycle-GAN and graph attention networks[C]. The 33rd International Conference on Neural Information Processing Systems, Vancouver, ‎Canada, 2019: 137–146.
[17]	MANGALAM K, GIRASE H, AGARWAL S, et al. It is not the journey but the destination: Endpoint conditioned trajectory prediction[C]. European Conference on Computer Vision, Glasgow, United Kingdom, 2020: 759–776.
[18]	SALZMANN T, IVANOVIC B, CHAKRAVARTY P, et al. Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data[C]. European Conference on Computer Vision, Glasgow, United Kingdom, 2020: 683–700.
[19]	KOTHARI P, KREISS S, and ALAHI A. Human trajectory forecasting in crowds: A deep learning perspective[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(7): 7386–7400. doi: 10.1109/TITS.2021.3069362
[20]	LEE N, CHOI W, VERNAZA P, et al. DESIRE: Distant future prediction in dynamic scenes with interacting agents[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 2017: 336–345.
[21]	KINGMA D P and WELLING M. Auto-encoding variational Bayes[C]. International Conference on Learning Representations ICLR 2014 Conference Track (ICLR), Banff, Canada, 2014: 1–14.