高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于多尺度特征增强与全局-局部特征聚合的视频目标分割算法

侯志强 董佳乐 马素刚 王晨旭 杨小宝 王昀琛

侯志强, 董佳乐, 马素刚, 王晨旭, 杨小宝, 王昀琛. 基于多尺度特征增强与全局-局部特征聚合的视频目标分割算法[J]. 电子与信息学报. doi: 10.11999/JEIT231394
引用本文: 侯志强, 董佳乐, 马素刚, 王晨旭, 杨小宝, 王昀琛. 基于多尺度特征增强与全局-局部特征聚合的视频目标分割算法[J]. 电子与信息学报. doi: 10.11999/JEIT231394
HOU Zhiqiang, DONG Jiale, MA Sugang, WANG Chenxu, YANG Xiaobao, WANG Yunchen. Video Object Segmentation Algorithm Based on Multi-scale Feature Enhancement and Global-Local Feature Aggregation[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT231394
Citation: HOU Zhiqiang, DONG Jiale, MA Sugang, WANG Chenxu, YANG Xiaobao, WANG Yunchen. Video Object Segmentation Algorithm Based on Multi-scale Feature Enhancement and Global-Local Feature Aggregation[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT231394

基于多尺度特征增强与全局-局部特征聚合的视频目标分割算法

doi: 10.11999/JEIT231394
基金项目: 国家自然科学基金(62072370),陕西省自然科学基金(2023-JC-YB-598)
详细信息
    作者简介:

    侯志强:男,博士,教授,研究方向为计算机视觉、目标跟踪等

    董佳乐:男,硕士生,研究方向为计算机视觉、视频目标分割等

    马素刚:男,博士,教授,研究方向为计算机视觉、机器学习等

    通讯作者:

    董佳乐 djl112299@163.com

  • 中图分类号: TN911.73; TP391.41

Video Object Segmentation Algorithm Based on Multi-scale Feature Enhancement and Global-Local Feature Aggregation

Funds: The National Natural Science Foundation of China (62072370), The Natural Science Foundation of Shaanxi Province (2023-JC-YB-598)
  • 摘要: 针对记忆网络算法中多尺度特征表达能力不足和浅层特征没有充分利用的问题,该文提出一种多尺度特征增强与全局-局部特征聚合的视频目标分割(VOS)算法。首先,通过多尺度特征增强模块融合可参考掩码分支和可参考RGB分支的不同尺度特征信息,增强多尺度特征的表达能力;同时,建立了全局-局部特征聚合模块,利用不同大小感受野的卷积操作来提取特征,并通过特征聚合模块来自适应地融合全局区域和局部区域的特征,这种融合方式可以更好地捕捉目标的全局特征和细节信息,提高分割的准确性;最后,设计了跨层融合模块,利用浅层特征的空间细节信息来提升分割掩码的精度,通过将浅层特征与深层特征融合,能更好地捕捉目标的细节和边缘信息。实验结果表明,在公开数据集DAVIS2016, DAVIS2017和YouTube-2018上,该文算法的综合性能分别达到91.8%、84.5%和83.0%,在单目标和多目标分割任务上都能实时运行。
  • 图  1  多尺度特征增强与全局-局部特征聚合的视频目标分割算法整体框架

    图  2  多尺度特征增强模块

    图  3  全局-局部特征聚合模块

    图  4  跨层融合模块

    图  5  本文算法在DAVIS2016h和 DAVIS2017验证集上与近年算法的性能和速度比较

    图  6  本文算法与对比算法在DAVIS2017数据集上的部分分割结果比较

    图  7  本文算法在DAVIS2017数据集和YouTube-2018数据集的部分定性结果展示

    表  1  DAVIS2016和DAVIS2017验证集不同算法的性能比较

    算法来源DAVIS2016DAVIS2017
    J&FJFFPS时间(s)J&FJFFPS时间(s)
    OSVOS [5]CVPR201780.279.880.60.1010.0060.356.663.90.110.00
    OnAVOS[7]CVPRW201785.586.184.90.0812.5063.661.066.122.00.05
    OSVOS-S[25]TPAMI201886.685.687.50.205.0068.064.771.30.110.00
    OSNM[26]CVPR201873.57472.97.700.1354.852.557.17.00.14
    FAVOS[27]CVPR201882.479.580.90.601.6758.254.661.85.60.18
    AGAME[14]CVPR201982.182.082.214.000.0770.067.472.614.00.07
    RANet[28]ICCV201985.585.585.433.000.0365.763.268.233.00.03
    FTMU[29]CVPR202078.977.580.311.000.0970.669.172.111.00.09
    SSM[19]T-CSVT202185.986.285.637.000.0377.675.379.9----
    TMO[20]TCSVT202386.185.686.643.200.0272.369.974.737.00.03
    STM[11]ICCV201989.388.789.910.300.1081.879.284.38.80.11
    FRTM[21]CVPR202083.683.783.413.000.0876.773.879.621.90.05
    GC[15]ECCV202086.687.685.725.000.0471.469.373.5----
    KMN[16]ECCV202090.589.583.69.000.1182.880.085.68.00.13
    TransVOS[22]CVPR202190.589.891.2----83.981.486.4----
    MTMFI[23]Neurocomputing202285.284.985.513.700.0777.674.680.613.70.07
    ILTR[24]计算机学报202284.684.984.318.00
    0.0672.970.075.8----
    KMNM[17]TPAMI202391.290.292.18.000.1383.580.986.18.00.13
    LLB[30]AAAI2023----------84.681.587.78.30.12
    MGLAS本文91.890.693.033.450.0384.581.687.326.60.04
    下载: 导出CSV

    表  2  YouTube-2018验证集不同算法的性能比较

    算法来源GJs $ J $JuFsFu
    MSK[13]CVPR201753.159.945.059.547.9
    OnAVOS[7]CVPRW201755.260.146.662.751.4
    OSVOS[5]CVPR201758.859.854.260.560.7
    OSNM[26]CVPR201851.260.040.660.144.0
    RGMP[8]CVPR201853.859.545.2----
    AGAME[14]CVPR201966.066.961.2----
    STM[11]ICCV201978.978.673.382.880.9
    FRTM[21]CVPR202065.768.658.471.364.5
    SSM[19]T-CSVT202166.572.357.873.362.6
    TranVOS[22]CVPR202181.882.075.086.783.4
    ILTR[24]计算机学报202273.873.967.577.975.7
    KMNM[17]TPAMI202381.481.475.385.683.3
    LLB[30]AAAI202383.882.179.187.087.0
    MGLAS本文83.081.977.986.585.7
    下载: 导出CSV

    表  3  本文算法在DAVIS2017验证集上的消融实验

    基准算法 MFEM GLFAM CFM J&F J F
    81.8 79.2 84.3
    83.2 79.9 86.5
    83.5 80.6 86.4
    83.5 80.0 86.9
    84.5 81.6 87.3
    下载: 导出CSV
  • [1] ERDÉLYI A, BARÁT T, VALET P, et al. Adaptive cartooning for privacy protection in camera networks[C]. 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Seoul, Korea (South), 2014: 44–49. doi: 10.1109/AVSS.2014.6918642.
    [2] WANG Wenguan, SHEN Jianbing, PORIKLI F, et al. Semi-supervised video object segmentation with super-trajectories[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(4): 985–998. doi: 10.1109/TPAMI.2018.2819173.
    [3] SALEH K, HOSSNY M, and NAHAVANDI S. Kangaroo vehicle collision detection using deep semantic segmentation convolutional neural network[C]. 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Gold Coast, Australia, 2016: 1–7. doi: 10.1109/DICTA.2016.7797057.
    [4] LU Xiankai, WANG Wenguan, SHEN Jianbing, et al. Learning video object segmentation from unlabeled videos[C]. Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 8957–8967. doi: 10.1109/CVPR42600.2020.00898.
    [5] CAELLES S, MANINIS K K, PONT-TUSET J, et al. One-shot video object segmentation[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 5320–5329. doi: 10.1109/CVPR.2017.565.
    [6] CHENG H K, TAI Y W, and TANG C K. Modular interactive video object segmentation: Interaction-to-mask, propagation and difference-aware fusion[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 5555–5564. doi: 10.1109/CVPR46437.2021.00551.
    [7] VOIGTLAENDER P and LEIBE B. Online adaptation of convolutional neural networks for video object segmentation[C]. British Machine Vision Conference 2017, London, UK, 2017.
    [8] OH S W, LEE J Y, SUNKAVALLI K, et al. Fast video object segmentation by reference-guided mask propagation[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 7376–7385. doi: 10.1109/CVPR.2018.00770.
    [9] 徐金东, 赵甜雨, 冯国政, 等. 基于上下文模糊C均值聚类的图像分割算法[J]. 电子与信息学报, 2021, 43(7): 2079–2086. doi: 10.11999/JEIT200263.

    XU Jindong, ZHAO Tianyu, FENG Guozheng, et al. Image segmentation algorithm based on context fuzzy C-means clustering[J]. Journal of Electronics & Information Technology, 2021, 43(7): 2079–2086. doi: 10.11999/JEIT200263.
    [10] 杭昊, 黄影平, 张栩瑞, 等. 面向道路场景语义分割的移动窗口变换神经网络设计[J]. 光电工程, 2024, 51(1): 230304. doi: 10.12086/oee.2024.230304.

    HANG Hao, HUANG Yingping, ZHANG Xurui, et al. Design of swin transformer for semantic segmentation of road scenes[J]. Opto-Electronic Engineering, 2024, 51(1): 230304. doi: 10.12086/oee.2024.230304.
    [11] OH S W, LEE J Y, XU Ning, et al. Video object segmentation using space-time memory networks[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 9225–9234. doi: 10.1109/ICCV.2019.00932.
    [12] LUITEN J, VOIGTLAENDER P, and LEIBE B. PReMVOS: Proposal-generation, refinement and merging for video object segmentation[C]. 14th Asian Conference on Computer Vision, Perth, Australia, 2019: 565–580. doi: 10.1007/978-3-030-20870-7_35.
    [13] PERAZZI F, KHOREVA A, BENENSON R, et al. Learning video object segmentation from static images[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 3491–3500. doi: 10.1109/CVPR.2017.372.
    [14] JOHNANDER J, DANELLJAN M, BRISSMAN E, et al. A generative appearance model for end-to-end video object segmentation[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA: 2019: 8945–8954. doi: 10.1109/CVPR.2019.00916.
    [15] LI Yu, SHEN Zhuoran, and SHAN Ying. Fast video object segmentation using the global context module[C]. 16th European Conference on Computer Vision, Glasgow, UK, 2020: 735–750. doi: 10.1007/978-3-030-58607-2_43.
    [16] SEONG H, HYUN J, and KIM E. Kernelized memory network for video object segmentation[C]. 16th European Conference on Computer Vision, Glasgow, UK, 2020: 629–645. doi: 10.1007/978-3-030-58542-6_38.
    [17] SEONG H, HYUN J, and KIM E. Video object segmentation using Kernelized memory network with multiple kernels[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(2): 2595–2612. doi: 10.1109/TPAMI.2022.3163375.
    [18] KINGMA D P and BA J. Adam: A method for stochastic optimization[C]. 3rd International Conference on Learning Representations, San Diego, USA, 2015.
    [19] ZHU Wencheng, LI Jiahao, LU Jiwen, et al. Separable structure modeling for semi-supervised video object segmentation[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(1): 330–344. doi: 10.1109/TCSVT.2021.3060015.
    [20] CHO S, LEE M, LEE S, et al. Treating motion as option to reduce motion dependency in unsupervised video object segmentation[C]. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, USA, 2023: 5129–5138. doi: 10.1109/WACV56688.2023.00511.
    [21] ROBINSON A, LAWIN F J, DANELLJAN M, et al. Learning fast and robust target models for video object segmentation[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 7404–7413. doi: 10.1109/CVPR42600.2020.00743.
    [22] MEI Jianbiao, WANG Mengmeng, LIN Yeneng, et al. TransVOS: Video object segmentation with transformers[J]. arXiv: 2106.00588, 2021. doi: 10.48550/arXiv.2106.00588. (查阅网上资料,未能确认文献类型,请确认) .
    [23] GAO Bocong, ZHAO Yuqian, ZHANG Fan, et al. Video object segmentation based on multi-level target models and feature integration[J]. Neurocomputing, 2022, 492: 396–407. doi: 10.1016/j.neucom.2022.04.042.
    [24] 徐凯, 李国荣, 洪德祥, 等. 结合在线归纳和直推推理的快速视频目标分割方法[J]. 计算机学报, 2022, 45(10): 2117–2132. doi: 10.11897/SP.J.1016.2022.02117.

    XU Kai, LI Guorong, HONG Dexiang, et al. A fast video object segmentation method based on inductive learning and transductive reasoning[J]. Chinese Journal of Computers, 2022, 45(10): 2117–2132. doi: 10.11897/SP.J.1016.2022.02117.
    [25] MANINIS K K, CAELLES S, CHEN Yuhua, et al. Video object segmentation without temporal information[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(6): 1515–1530. doi: 10.1109/TPAMI.2018.2838670.
    [26] YANG Linjie, WANG Yanran, XIONG Xuehan, et al. Efficient video object segmentation via network modulation[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 6499–6507. doi: 10.1109/CVPR.2018.00680.
    [27] CHENG Jingchun, TSAI Y H, HUNG W C, et al. Fast and accurate online video object segmentation via tracking parts[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 7415–7424. doi: 10.1109/CVPR.2018.00774.
    [28] WANG Ziqin, XU Jun, LIU Li, et al. RANet: Ranking attention network for fast video object segmentation[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 3977–3986. doi: 10.1109/ICCV.2019.00408.
    [29] SUN Mingjie, XIAO Jimin, LIM E G, et al. Fast template matching and update for video object tracking and segmentation[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 10788–10796. doi: 10.1109/CVPR42600.2020.01080.
    [30] LAN Meng, ZHANG Jing, ZHANG Lefei, et al. Learning to learn better for video object segmentation[C]. Proceedings of the AAAI Conference on Artificial Intelligence, Washington, USA, 2023: 1205–1212. doi: 10.1609/aaai.v37i1.25203.
  • 加载中
图(7) / 表(3)
计量
  • 文章访问数:  63
  • HTML全文浏览量:  14
  • PDF下载量:  15
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-12-18
  • 修回日期:  2024-09-25
  • 网络出版日期:  2024-09-30

目录

    /

    返回文章
    返回