高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

集成全局局部特征交互与角动量机制的端到端多目标跟踪算法

计忠平 王相威 何志伟 杜晨杰 金冉 柴本成

计忠平, 王相威, 何志伟, 杜晨杰, 金冉, 柴本成. 集成全局局部特征交互与角动量机制的端到端多目标跟踪算法[J]. 电子与信息学报, 2024, 46(9): 3703-3712. doi: 10.11999/JEIT240277
引用本文: 计忠平, 王相威, 何志伟, 杜晨杰, 金冉, 柴本成. 集成全局局部特征交互与角动量机制的端到端多目标跟踪算法[J]. 电子与信息学报, 2024, 46(9): 3703-3712. doi: 10.11999/JEIT240277
JI Zhongping, WANG Xiangwei, HE Zhiwei, DU Chenjie, JIN Ran, CHAI Bencheng. End-to-end Multi-Object Tracking Algorithm Integrating Global Local Feature Interaction and Angular Momentum Mechanism[J]. Journal of Electronics & Information Technology, 2024, 46(9): 3703-3712. doi: 10.11999/JEIT240277
Citation: JI Zhongping, WANG Xiangwei, HE Zhiwei, DU Chenjie, JIN Ran, CHAI Bencheng. End-to-end Multi-Object Tracking Algorithm Integrating Global Local Feature Interaction and Angular Momentum Mechanism[J]. Journal of Electronics & Information Technology, 2024, 46(9): 3703-3712. doi: 10.11999/JEIT240277

集成全局局部特征交互与角动量机制的端到端多目标跟踪算法

doi: 10.11999/JEIT240277
基金项目: 国家自然科学基金(61671192),中国博士后科学基金(2017M114),浙江省自然科学基金(LY22F020025),浙江省教育厅一般项目(Y202351320)
详细信息
    作者简介:

    计忠平:男,教授,研究方向为计算机视觉、机器学习

    王相威:男,硕士,研究方向为计算机视觉、目标跟踪

    何志伟:男,教授,研究方向为计算机视觉、汽车电子技术

    杜晨杰:男,讲师,研究方向为计算机视觉、目标跟踪

    金冉:男,教授,研究方向为跨媒体分析、多模态信息处理

    柴本成:男,副教授,研究方向为目标跟踪、智能信息处理

    通讯作者:

    杜晨杰 ducj@hdu.edu.cn

  • 中图分类号: TN911.73;TP391.41

End-to-end Multi-Object Tracking Algorithm Integrating Global Local Feature Interaction and Angular Momentum Mechanism

Funds: The National Natural Science Foundation of China (61671192), China Postdoctoral Science Foundation (2017M114), The Natural Foundation of Zhejiang Province (LY22F020025), The General Project of the Zhejiang Provincial Department of Education (Y202351320)
  • 摘要: 针对多目标跟踪(MOT)算法性能对于检测准确度和数据关联策略的依赖性问题,该文提出一种新的端到端算法。在检测方面,首先基于特征金字塔网络,提出空间残差特征金字塔模块(SRFPN),以提升特征融合和信息传递的效率。随后,引入全局局部特征交互模块(GLFIM)来平衡局部细节和全局上下文信息,增强多尺度特征的专注度,提高模型对目标尺度变化的适应性。在关联方面,引入角动量机制(AMM),充分考虑目标运动方向,以提升连续帧之间目标匹配的精确性。在MOT17和UAVDT数据集上进行实验验证,所提跟踪器的检测性能和关联性能均显著提升,并且在目标遮挡、尺度变化和杂乱背景等复杂场景下表现出良好的鲁棒性。
  • 图  1  本文跟踪算法框图

    图  2  空间残差特征金字塔网络

    图  3  全局局部特征交互模块

    图  4  SRFPN消融结果

    图  5  GLFIM消融结果

    图  6  AMM修正代价矩阵

    图  7  AMM消融结果

    图  8  MOT17数据集与UAVDT数据上的可视化结果

    表  1  MOT17数据集上的实验结果对比

    算法 MOTA↑(%) IDF1↑(%) MOTP↑ MT↑(%) ML↓(%) IDs↓ FN↓ FP↓ Hz↑
    MPNTrack[19] 58.8 61.7 28.8 33.5 1 185 213 594 17 413 6.5
    CRF-RNN[20] 53.1 53.7 76.1 24.2 30.7 2 518 234 991 27 194 1.4
    DSORT[5] 60.3 61.2 79.1 31.5 20.3 2 442 185 301 36 111 20.0
    Tracktor++[21] 53.5 52.3 78.0 19.5 36.6 4 611 248 047 12 201 1.5
    CSE-FF[22] 67.7 56.1 77.9 33.7 22.8 5 706 30.0
    CTracker[11] 66.6 57.4 78.2 32.2 24.2 5 529 160 491 22 284 34.4
    本文 68.1 58.7 78.6 34.0 21.2 5 535 159 474 15 104 29.8
    注:粗体为每列最优,斜体为每列次优
    下载: 导出CSV

    表  2  UAVDT数据集上的实验结果对比

    算法 MOTA↑(%) IDF1↑(%) MOTP↑ MT↑(%) ML↓(%) IDs↓ FN↓ FP↓
    IOUT[23] 36.6 23.7 72.1 534 357 9 938 163 881 42 245
    SORT[3] 39 43.7 74.3 484 400 2 350 172 628 33 037
    DAN[4] 41.6 29.7 72.5 648 367 12 902
    ByteTrack[24] 39.1 44.7 74.3 2 341
    CTracker[11] 40.3 49.2 78.3 296 449 4 280 152 568 50 292
    本文 41.4 50.3 78.4 301 441 3 841 154 938 44 330
    下载: 导出CSV

    表  3  SRFPN不同感受野扩展方法的性能对比

    算法 MOTA↑(%) IDF1↑(%) Para↓ Hz↑
    RFB[25] 67.4 57.0 13.8 31.5
    DAPPM[26] 67.2 57.5 16.7 26.3
    SPPF(本文) 67.5 57.2 13.5 31.7
    下载: 导出CSV

    表  4  不同算法在上下文信息利用上的性能对比

    算法 MOTA↑(%) IDF1↑(%) Para↓ Hz↑
    DANet[27] 66.8 57.6 45.6 31.7
    ERGM[28] 67.0 58.0 45.4 32.3
    GLFIM(本文) 67.1 58.3 45.7 32.9
    下载: 导出CSV

    表  5  MOT17数据集上的消融实验效果

    算法模块 评价指标
    SRFPN GLFIM AMM MOTA↑(%) IDF1↑(%) MOTP↑ MT↑(%) ML↓(%) IDs↓ FN↓ FP↓ Hz↑
    66.6 57.4 78.2 32.2 24.2 5 529 160 491 22 284 34.4
    67.5 57.2 78.7 33.9 20.7 5 489 152 579 22 169 31.7
    67.1 58.3 78.3 33.3 21.5 5 360 154 915 15 118 32.9
    66.6 57.7 78.1 32.5 24.0 5 093 160 497 22 298 33.5
    67.3 58.3 78.5 33.6 21.5 5 499 154 426 21 452 30.9
    67.1 58.5 78.4 33.5 21.5 5 365 154 936 21 610 32.4
    67.3 57.8 78.5 33.7 21.5 5 415 153 268 23 473 30.6
    68.1 58.7 78.6 34.0 21.2 5 535 159 474 15 104 29.8
    下载: 导出CSV

    表  6  GLFIM引入位置消融实验结果

    层级 MOTA↑(%) IDF1↑(%) Para↓ Hz↑
    Baseline 66.6 57.4 43.6 34.4
    P7 66.8 57.6 44.0 34.1
    P7,P6 66.8 57.9 44.4 33.9
    P7,P6,P5 67.2 57.0 44.9 33.6
    P7,P6,P5,P4 67.0 57.8 45.3 33.4
    All(本文) 67.1 58.3 45.7 32.9
    下载: 导出CSV
  • [1] 张红颖, 贺鹏艺. 基于卷积注意力模块和无锚框检测网络的行人跟踪算法[J]. 电子与信息学报, 2022, 44(9): 3299–3307. doi: 10.11999/JEIT210634.

    ZHANG Hongying and HE Pengyi. Pedestrian tracking algorithm based on convolutional block attention module and anchor-free detection network[J]. Journal of Electronics & Information Technology, 2022, 44(9): 3299–3307. doi: 10.11999/JEIT210634.
    [2] 伍瀚, 聂佳浩, 张照娓, 等. 基于深度学习的视觉多目标跟踪研究综述[J]. 计算机科学, 2023, 50(4): 77–87. doi: 10.11896/jsjkx.220300173.

    WU Han, NIE Jiahao, ZHANG Zhaowei, et al. Deep learning-based visual multiple object tracking: A review[J]. Computer Science, 2023, 50(4): 77–87. doi: 10.11896/jsjkx.220300173.
    [3] BEWLEY A, GE Zongyuan, OTT L, et al. Simple online and realtime tracking[C]. 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, USA, 2016: 3464–3468. doi: 10.1109/ICIP.2016.7533003.
    [4] SUN Shijie, AKHTAR N, SONG Huansheng, et al. Deep affinity network for multiple object tracking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(1): 104–119. doi: 10.1109/TPAMI.2019.2929520.
    [5] WOJKE N, BEWLEY A, and PAULUS D. Simple online and realtime tracking with a deep association metric[C]. 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 2017: 3645–3649. doi: 10.1109/ICIP.2017.8296962.
    [6] WANG Zhongdao, ZHENG Liang, LIU Yixuan, et al. Towards real-time multi-object tracking[C]. 16th European Conference on Computer Vision, Glasgow, UK, 2020: 107–122. doi: 10.1007/978-3-030-58621-8_7.
    [7] ZHANG Yifu, WANG Chunyu, WANG Xinggang, et al. FairMOT: On the fairness of detection and re-identification in multiple object tracking[J]. International Journal of Computer Vision, 2021, 129(11): 3069–3087. doi: 10.1007/s11263-021-01513-4.
    [8] YU En, LI Zhuoling, HAN Shoudong, et al. RelationTrack: Relation-aware multiple object tracking with decoupled representation[J]. IEEE Transactions on Multimedia, 2023, 25: 2686–2697. doi: 10.1109/TMM.2022.3150169.
    [9] CHU Peng, WANG Jiang, YOU Quanzeng, et al. TransMOT: Spatial-temporal graph transformer for multiple object tracking[C]. Proceedings of 2023 IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, USA, 2023: 4859–4869. doi: 10.1109/WACV56688.2023.00485.
    [10] XU Yihong, BAN Yutong, DELORME G, et al. TransCenter: Transformers with dense representations for multiple-object tracking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(6): 7820–7835. doi: 10.1109/TPAMI.2022.3225078.
    [11] PENG Jinlong, WANG Changan, WAN Fangbin, et al. Chained-tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking[C]. 16th European Conference on Computer Vision, Glasgow, UK, 2020: 145–161. doi: 10.1007/978-3-030-58548-8_9.
    [12] PANG Bo, LI Yizhuo, ZHANG Yifan, et al. TubeTK: Adopting tubes to track multi-object in a one-step training model[C]. Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 6307–6317. doi: 10.1109/CVPR42600.2020.00634.
    [13] ZHANG Chuang, ZHENG Sifa, WU Haoran, et al. AttentionTrack: Multiple object tracking in traffic scenarios using features attention[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(2): 1661–1674. doi: 10.1109/TITS.2023.3315222.
    [14] OGAWA T, SHIBATA T, and HOSOI T. FRoG-MOT: Fast and robust generic multiple-object tracking by IoU and motion-state associations[C]. 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, USA, 2024: 6549–6558. doi: 10.1109/WACV57701.2024.00643.
    [15] HUANG Huimin, XIE Shiao, LIN Lanfen, et al. ScaleFormer: Revisiting the transformer-based backbones from a scale-wise perspective for medical image segmentation[C]. Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, Vienna, Austria, 2022: 964–971. doi: 10.24963/ijcai.2022/135.
    [16] MILAN A, LEAL-TAIXÉ L, REID I, et al. MOT16: A benchmark for multi-object tracking[EB/OL]. https://arxiv.org/abs/1603.00831, 2016.
    [17] DU Dawei, QI Yuankai, YU Hongyang, et al. The unmanned aerial vehicle benchmark: Object detection and tracking[C]. Proceedings of the 15th European Conference on Computer Vision, Munich, Germany, 2018: 375–391. doi: 10.1007/978-3-030-01249-6_23.
    [18] LUO Yutong, ZHONG Xinyue, ZENG Minchen, et al. CGLF-Net: Image emotion recognition network by combining global self-attention features and local multiscale features[J]. IEEE Transactions on Multimedia, 2024, 26: 1894–1908. doi: 10.1109/TMM.2023.3289762.
    [19] BRASÓ G and LEAL-TAIXÉ L. Learning a neural solver for multiple object tracking[C]. Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 6246–6256. doi: 10.1109/CVPR42600.2020.00628.
    [20] XIANG Jun, XU Guohan, MA Chao, et al. End-to-end learning deep CRF models for multi-object tracking deep CRF models[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(1): 275–288. doi: 10.1109/TCSVT.2020.2975842.
    [21] BERGMANN P, MEINHARDT T, and LEAL-TAIXÉ L. Tracking without bells and whistles[C]. Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 941–951. doi: 10.1109/ICCV.2019.00103.
    [22] ZHOU Yan, CHEN Junyu, WANG Dongli, et al. Multi-object tracking using context-sensitive enhancement via feature fusion[J]. Multimedia Tools and Applications, 2024, 83(7): 19465–19484. doi: 10.1007/s11042-023-16027-z.
    [23] BOCHINSKI E, EISELEIN V, and SIKORA T. High-speed tracking-by-detection without using image information[C]. 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, Italy, 2017: 1–6. doi: 10.1109/AVSS.2017.8078516.
    [24] ZHANG Yifu, SUN Peize, JIANG Yi, et al. ByteTrack: Multi-object tracking by associating every detection box[C]. 17th European Conference on Computer Vision, Tel Aviv, Israel, 2022: 1–21. doi: 10.1007/978-3-031-20047-2_1.
    [25] LIU Songtao, HUANG Di, and WANG Yunhong. Receptive field block net for accurate and fast object detection[C]. Proceedings of the 15th European Conference on Computer Vision, Munich, Germany, 2018: 404–419. doi: 10.1007/978-3-030-01252-6_24.
    [26] PAN Huihui, HONG Yuanduo, SUN Weichao, et al. Deep dual-resolution networks for real-time and accurate semantic segmentation of traffic scenes[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(3): 3448–3460. doi: 10.1109/TITS.2022.3228042.
    [27] FU Jun, LIU Jing, TIAN Haijie, et al. Dual attention network for scene segmentation[C]. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 3141–3149. doi: 10.1109/CVPR.2019.00326.
    [28] XIE Yakun, ZHU Jun, LAI Jianbo, et al. An enhanced relation-aware global-local attention network for escaping human detection in indoor smoke scenarios[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2022, 186: 140–156. doi: 10.1016/j.isprsjprs.2022.02.006.
  • 加载中
图(8) / 表(6)
计量
  • 文章访问数:  112
  • HTML全文浏览量:  37
  • PDF下载量:  21
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-04-15
  • 修回日期:  2024-08-25
  • 网络出版日期:  2024-08-30
  • 刊出日期:  2024-09-26

目录

    /

    返回文章
    返回