Tracklet Generation Method by Submodular Optimization for Multi-Object Tracking
摘要: 作为智能视觉任务的基础工作,多目标跟踪(MOT)一直是计算机视觉领域具有挑战性的课题之一。遮挡是影响跟踪准确性的主要因素,为此该文采用基于检测跟踪的思想,以轨迹片段为基础进行关联获取目标的完整轨迹;同时,为提高跟踪鲁棒性,该文将轨迹片段的生成问题转化为运筹学中的设施选址问题,并进而提出基于次模优化的轨迹片段生成方法。该方法融合梯度(HOG)和颜色(CN)两个互补特征进行目标表征,并根据运动信息设计权重系数提高目标匹配准确度,最后提出具有约束的次模最大化算法实现全局范围内的数据关联生成轨迹片段。通过在多个基准数据集上的对比实验,表明该文算法在保证性能的同时能有效处理遮挡问题。Abstract: As the basis of many intelligent visual tasks, Multi-Object Tracking (MOT) is a challenging problem in computer vision. Occlusion is a main factor affecting the tracking accuracy. To solve the occlusion problem, in this paper, the strategy of tracking-by-detection is adopted to obtain complete trajectories of targets based on associating tracklets. Meanwhile, to improve the tracking robustness, the tracklet generation problem is transformed into the facility location problem in operations research area and further a submodular optimization based tracklet generation method is proposed. In this method, two complementary features including Histogram of Oriented Gradient (HOG)and Color Name (CN) are integrated to describe the target appearance, and a weighting coefficient is also designed by motion information to improve the matching accuracy. At length, a submodular maximization algorithm with constraints is developed to achieve the global data association by selecting the targets to form the tracklets. By comparative experiments on the benchmark datasets, the proposed method can solve the occlusion problem effectively with guaranteed performance.
Key words:
- Multi-Object Tracking (MOT) /
- Tracklet /
- Data association /
- Submodular optimization
算法1 基于次模优化的轨迹片段生成 输入: 视频片段 Vm,该视频片段包含K个图像帧 输出: 生成的轨迹片段集合Tm={$ t_1^m $, $ t_2^m $, ···} 初始化:i=1,j=1,k=1,α=0.85;Tm=Ø; 检测目标集Dm=Sm∪Rm,其中初始目标集$ {S_m} = \{ d_{mk}^1,d_{mk}^2, \cdots ,d_{mk}^{{n_k}}\} $,候选集目标集
${R_m} = \{ d_{m(k + 1)}^1, \cdots ,d_{m(k + 1)}^{{n_{k + 1}}}, \cdots ,d_{mK}^1, \cdots ,d_{mK}^{{n_K}}\} $执行: (1) 提取Dm中每个检测目标的HOG和CN特征; (2) while k<=K (3) 根据式(14)计算初始目标集Sm与候选目标集Rm中目标间的相似度 (4) while i<=ni (5) $ t^i_m $ = Ø (6) while j<K (7) 在候选目标集Rm中选择与初始目标$ d^i_{mk} $具有最大相似度的目标$ d^p_{mr} $,对应相似度为sip (8) if (sip > w) (9) $ t^i_m $ ← {$ d^p_{mr}$}∪ $t^i_m $ (10) 从候选目标集中删除$ d^p_{mr} $所在第r帧的其他目标 (11) j++ (12) end if (13) end while (14) i++ (15) Tm ← {$ t^i_m $}∪Tm (16) end while (17) k=k+1 (18) 将第k帧中未被匹配关联的目标组成初始目标集合$ {S_m} = \{ d_{mk}^1,d_{mk}^2, \cdots ,d_{mk}^{{n_k}}\} $ (19) end while 表 1 PETS09-S2L1和TUD数据集跟踪性能对比
数据集 算法 MOTA(%)↑ MOTP(%)↑ MT(%)↑ ML(%)↓ IDS↓ PETS09-S2L1 Intra Track[29] 81.6 79.4 – – 684 R1TA Track[30] 96.0 82.0 100.0 0 14 DSC[31] 90.0 56.8 89.5 0 15 本文方法 96.3 72.3 96.2 0 12 TUD-Stadtmitte GMMCP[12] 82.4 73.9 – 0 3 CNNTCM[27] 80.8 – 90.0 0 – R1TA Track[30] 84.8 89.6 70.0 – – DSC[31] 72.4 52.6 60.0 0 10 本文方法 90.6 87.6 90.0 0 0 TUD-Crossing GMMCP[12] 91.9 70.0 – 0 7 SUBM[16] 60.2 77.2 15.4 7.7 32 本文方法 92.4 75.6 18.6 0 2 表 2 MOT17数据集跟踪性能对比
算法 MOTA(%)↑ IDF1↑ MOTP(%)↑ MT(%)↑ ML(%)↓ IDS↓ IOU[11] 45.5 39.4 76.9 15.7 40.5 5988 EDMT[32] 50.0 51.3 77.3 21.6 36.3 2264 jCC[33] 51.2 54.5 75.9 20.9 37.0 1802 LPT[9] 57.3 57.7 – 23.3 36.9 1424 MPNTrack[34] 58.8 61.7 – 28.8 33.5 1185 JBNOT[35] 52.6 50.8 77.1 19.7 35.8 3050 TT17[36] 54.9 63.1 – 24.4 38.1 1088 Deep-TAMA[37] 50.3 53.5 – 19.2 37.5 2192 本文方法 56.4 58.2 78.1 21.1 32.8 1097 -
