复杂场景点云数据的6D位姿估计深度学习网络

陈海永; 李龙腾; 陈鹏; 孟蕊

doi:10.11999/JEIT211000

复杂场景点云数据的6D位姿估计深度学习网络

doi: 10.11999/JEIT211000

河北工业大学天津 300130

基金项目: 国家自然科学基金( U21A20482, 62073117)；中央引导地方科技发展资金项目(206Z1701G)

详细信息

作者简介:
陈海永：男，1980年生，教授，博士生导师，研究方向为图像处理、机器视觉、模式识别等

李龙腾：男，1996年生，硕士生，研究方向为点云处理、3维视觉等

陈鹏：男，1981年生，讲师，研究方向为智能机器人、机器视觉、3维环境感知等

孟蕊：女，1997年生，硕士生，研究方向为机器视觉、模式识别和深度学习等

通讯作者:
陈海永　haiyong.chen@hebut.edu.cn

中图分类号: TP391.4
计量
- 文章访问数: 859
- HTML全文浏览量: 1219
- PDF下载量: 132
- 被引次数: 0
出版历程
- 收稿日期: 2021-09-18
- 修回日期: 2022-04-06
- 录用日期: 2022-04-08
- 网络出版日期: 2022-04-10
- 刊出日期: 2022-05-25

6D Pose Estimation Network in Complex Point Cloud Scenes

Hebei University of Technology , Tianjin 300130, China

Funds: The National Natural Science Foundation of China (U21A20482, 62073117), The Central Leading Local Science and Technology Development Fund Project (206Z1701G)

摘要

摘要: 针对工业上常见的弱纹理、散乱摆放复杂场景下点云目标机器人抓取问题，该文提出一种6D位姿估计深度学习网络。首先，模拟复杂场景下点云目标多姿态随机摆放的物理环境，生成带真实标签的数据集；进而，设计了6D位姿估计深度学习网络模型，提出多尺度点云分割网络(MPCS-Net)，直接在完整几何点云上进行点云实例分割，解决了对RGB信息和点云分割预处理的依赖问题。然后，提出多层特征姿态估计网(MFPE-Net)，有效地解决了对称物体的位姿估计问题。最后，实验结果和分析证实了，相比于传统的点云配准方法和现有的切分点云的深度学习位姿估计方法，所提方法取得了更高的准确率和更稳定性能，并且在估计对称物体位姿时有较强的鲁棒性。
- 点云 /
- 深度学习 /
- 位姿估计
Abstract: Focusing on the robot grasping problem of point cloud targets in complex scenes with weak texture and scattered placement, a 6D pose estimation deep learning network is proposed. First, the complex scenes of the physical environment are simulated, where point cloud targets are randomly placed in multiple poses to generate a dataset with real labels; Further, a 6D pose estimation deep learning network model is designed, and a Multiscale Point Cloud Segmentation Net (MPCS-Net) is proposed to segment point cloud instances directly on the complete geometric point cloud, solving the dependence on RGB information and point cloud segmentation pre-processing. Then, the Multilayer Feature Pose Estimation Net (MFPE-Net) is proposed, which addresses effectively the pose estimation problem of symmetrical objects. Finally, the experimental results and analysis confirm that, compared with the traditional point cloud registration methods and the existing deep learning pose estimation methods of the segmented point cloud, the proposed method achieves higher accuracy and more stable performance. The preferable robustness in estimating the pose of symmetrical objects also proves its efficacy.
- Point clouds /
- Deep learning /
- Pose estimation

HTML全文

图 1 机器人抓取装配系统

下载: 全尺寸图片幻灯片

图 2 数据集生成流程图

下载: 全尺寸图片幻灯片

图 3 工件CAD模型图及仿真场景点云样本示例

下载: 全尺寸图片幻灯片

图 4 网络整体架构图

下载: 全尺寸图片幻灯片

图 5 MPCS-Net 网络图

下载: 全尺寸图片幻灯片

图 6 特征聚类与采样模块流程图

下载: 全尺寸图片幻灯片

图 7 MFPE-Net结构图

下载: 全尺寸图片幻灯片

图 8 姿态特征提取模块结构图

下载: 全尺寸图片幻灯片

图 9 点云实例分割网络效果图

下载: 全尺寸图片幻灯片

图 10 实例预测出现错误情况图

下载: 全尺寸图片幻灯片

图 11 高维实例特征降维结果

下载: 全尺寸图片幻灯片

图 12 待抓取物体位姿估计效果

下载: 全尺寸图片幻灯片

图 13 物体C配准的情况

下载: 全尺寸图片幻灯片

表 1 训练基本配置表

配置项目	项目值	配置项目	项目值
数据集总量	10000个	平均点距(水平)	1 mm
单场景物体数	4～7个	优化器	SGD
训练集数量	9000个	训练迭代次数	500
测试集数量	1000个	BatchSize	16
初始学习率	0.01	学习率衰减步数	50

下载: 导出CSV

表 2 语义分割精度(%)和平均时间(s)

方法	精度(%)	平均时间(s)	物体A	物体B	物体C	物体D	物体E	物体F	物体G
PointNet++	82.93	0.286	86.74	80.23	83.33	78.53	83.51	85.74	88.73
MT-PNet	89.79	0.305	89.74	87.97	84.69	92.42	88.05	87.21	95.50
MV-CRF	91.03	2.973	91.27	92.03	89.65	89.02	92.78	89.95	94.47
本文	99.02	0.324	98.79	99.28	98.99	98.93	98.61	98.97	99.67

下载: 导出CSV

表 3 实例分割精度(%)和平均时间(s)

方法	精度(%)	平均时间(s)	物体A	物体B	物体C	物体D	物体E	物体F	物体G
MT-PNet	80.84	4.973	78.87	75.55	83.48	86.99	75.06	87.85	84.25
MV-CRF	84.45	8.934	83.03	80.21	85.77	88.96	80.57	89.11	89.48
本文	94.35	5.312	92.74	96.85	93.53	95.06	94.67	93.83	93.51

下载: 导出CSV

表 4 不同实例聚类方法精度(%)

方法	精度(%)	物体A	物体B	物体C	物体D	物体E	物体F	物体G
HAC	72.05	54.87	83.68	72.08	75.06	78.84	67.19	79.48
DBSCAN	89.75	83.51	92.06	94.05	80.83	85.47	92.59	90.64
MeanShift	94.35	92.74	96.85	93.53	95.06	94.67	93.83	93.51

下载: 导出CSV

表 5 姿态估计精度(%)

	FPFH+ICP		PPF+ICP		CloudPose+ICP		本文+ICP
	AD	AD-S	AD	AD-S	AD	AD-S	AD	AD-S
物体A	88.13	99.88	97.72	99.77	88.53	97.21	98.32	100
物体B	77.86	96.47	71.67	72.07	85.82	93.66	96.30	97.68
物体C	61.02	96.36	93.17	99.80	71.86	96.73	96.51	98.91
物体D	87.83	97.23	98.04	98.54	97.53	98.36	97.85	99.25
物体E	3.72	94.82	10.89	99.02	12.54	96.73	12.24	99.08
物体F	48.17	97.80	42.44	99.21	53.36	92.63	49.56	98.91
物体G	28.04	96.54	23.82	96.76	32.02	91.36	17.07	97.25

下载: 导出CSV

表 6 单个实例识别时间(s)

	FPFH+ICP	PPF+ICP	MPCS-Net+CloudPose+ICP	本文+ICP
平均计算时间(单个实例)	3.72	4.43	0.62	0.58

下载: 导出CSV

参考文献(23)

[1]	ASTANIN S, ANTONELLI D, CHIABERT P, et al. Reflective workpiece detection and localization for flexible robotic cells[J]. Robotics and Computer-Integrated Manufacturing, 2017, 44: 190–198. doi: 10.1016/j.rcim.2016.09.001
[2]	RUSU R B, BLODOW N, and BEETZ M. Fast point feature histograms (FPFH) for 3D registration[C]. 2009 IEEE International Conference on Robotics and Automation, Kobe, Japan, 2009: 3212–3217.
[3]	SALTI S, TOMBARI F, and DI STEFANO L. SHOT: Unique signatures of histograms for surface and texture description[J]. Computer Vision and Image Understanding, 2014, 125(8): 251–264. doi: 10.1016/j.cviu.2014.04.011
[4]	DROST B, ULRICH M, NAVAB N, et al. Model globally, match locally: Efficient and robust 3D object recognition[C]. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, USA, 2010: 998–1005.
[5]	BIRDAL T and ILIC S. Point pair features based object detection and pose estimation revisited[C]. 2015 International Conference on 3D Vision, Lyon, France, 2015: 527–535.
[6]	TANG Keke, SONG Peng, and CHEN Xiaoping. 3D object recognition in cluttered scenes with robust shape description and correspondence selection[J]. IEEE Access, 2017, 5: 1833–1845. doi: 10.1109/ACCESS.2017.2658681
[7]	HOLZ D, NIEUWENHUISEN M, DROESCHEL D, et al. Active Recognition and Manipulation for Mobile Robot Bin Picking[M]. RÖHRBEIN F, VEIGA G, NATALE C. Gearing Up and Accelerating Cross‐Fertilization Between Academic and Industrial Robotics Research in Europe. Cham: Springer, 2014: 133–153.
[8]	WU Chenghei, JIANG S Y, and SONG Kaitai. CAD-based pose estimation for random bin-picking of multiple objects using a RGB-D camera[C]. 2015 15th International Conference on Control, Automation and Systems (ICCAS), Busan, Korea (South), 2015: 1645–1649.
[9]	高雪梅. 面向自动化装配的零件识别与抓取方位规划[D]. [硕士论文], 哈尔滨工业大学, 2018. GAO Xuemei. Research on Objects Recognition and Grasping Position Planning for Robot Automatic Assemblysensing [D]. [Master dissertation], Harbin Institute of Technology, 2018.
[10]	LYU Yecheng, HUANG Xinming, and ZHANG Ziming. Learning to segment 3D point clouds in 2D image space[C]. The 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 12255–12264.
[11]	ZHOU Yin and TUZEL O. VoxelNet: End-to-end learning for point cloud based 3D object detection[C]. The 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 4490–4499.
[12]	QI C R, LIU Wei, WU Chenxia, et al. Frustum pointnets for 3D object detection from RGB-D data[C]. The 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 918–927.
[13]	PHAM Q H, NGUYEN T, HUA B S, et al. JSIS3D: Joint semantic-instance segmentation of 3D point clouds with multi-task pointwise networks and multi-value conditional random fields[C].The 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 8819–8828.
[14]	QI C R, SU Hao, MO Kaichun, et al. PointNet: Deep learning on point sets for 3D classification and segmentation[C]. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 77–85.
[15]	GAO Ge, LAURI M, WANG Yulong, et al. 6D object pose regression via supervised learning on point clouds[C]. 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 2020: 3643–3649.
[16]	DU Guoguang, WANG Kai, LIAN Shiguo, et al. Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: A review[J]. Artificial Intelligence Review, 2021, 54(3): 1677–1734. doi: 10.1007/s10462-020-09888-5
[17]	GSCHWANDTNER M, KWITT R, UHL A, et al. BlenSor: Blender sensor simulation toolbox[C]. International Symposium on Visual Computing, Las Vegas, USA, 2011: 199–208.
[18]	LU Qingkai, CHENNA K, SUNDARALINGAM B, et al. Planning Multi-fingered Grasps as Probabilistic Inference in a Learned Deep Network[M]. AMATO N, HAGER, G, THOMAS S, et al. Robotics Research. Cham: Springer, 2020: 455–472.
[19]	DE BRABANDERE B, NEVEN D, and VAN GOOL L. Semantic instance segmentation with a discriminative loss function[J]. arXiv preprint arXiv: 1708.02551, 2017.
[20]	KUHN H W. The Hungarian method for the assignment problem[J]. Naval Research Logistics, 2005, 52(1): 7–21. doi: 10.1002/nav.20053
[21]	LIU Liyuan, JIANG Haoming, HE Pengcheng, et al. On the variance of the adaptive learning rate and beyond[J]. arXiv preprint arXiv: 1908.03265v1, 2019.
[22]	GAO Ge, LAURI M, ZHANG Jianwei, et al. Occlusion Resistant Object Rotation Regression from Point Cloud Segments[M]. LEAL-TAIXÉ L and ROTH S. European Conference on Computer Vision. Cham: Springer, 2018: 716–729.
[23]	HINTERSTOISSER S, LEPETIT V, ILIC S, et al. Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes[C]. 11th Asian Conference on Computer Vision, Berlin, Germany, 2012: 548–562.