A Cross-Precision Motion Compensation Technique for Security Surveillance Video Coding
-
摘要: 在现代安防监控领域,高空球型摄像机因部署位置易受外部干扰,导致视频画面出现抖动、模糊等问题,严重影响监控效果与后期分析精度。视频压缩算法中,高精度运动补偿对提升编码效率至关重要,而当前的终极运动矢量表达(UMVE)技术存在精度和自适应调整不足等问题,图像配准编码模式(RCM)等虽能实现高精度运动补偿,但计算量和成本过高。针对这些问题,该研究提出了支持跨精度运动补偿的终极运动矢量表达技术(UMVE_CPMC),该技术融合基础运动矢量(BaseMV)与精细化微调运动矢量(MMV),通过构造扩展的升精度运动矢量(UPMV)提升运动补偿精度,且仅在1/8精度级别提供增量候选,实现计算复杂度与压缩效率的平衡。在步长自适应调整方面,提出6种模式的改进方案,编码器可根据场景灵活切换,以适应不同应用需求。实验表明,UMVE_CPMC在A类高清晰度运动场景下,编码增益显著,同时开启其他高精度运动补偿工具时,部分序列增益达1%–2%,在无其它高精度运动补偿工具时,部分序列增益超10%;在B类低清晰度场景下,通过帧级别自适应调整接口维持原有增益。此外,该技术在计算效率与资源占用间实现良好平衡,为解决高空球型摄像机视频编码问题提供了新的有效途径。Abstract:
Objective In the field of modern security surveillance, high-altitude dome cameras are often deployed at critical locations such as bridges and tower tops that are susceptible to external interference, resulting in problems such as jitter and blurring in captured videos, which pose great challenges to video coding. In video compression coding, high-precision motion compensation is the key to improving coding efficiency. The existing Ultimate Motion Vector Expression (UMVE) technique suffers from insufficient precision and lack of flexibility in adaptive adjustment. Although high-precision coding tools such as Registration-Based Coding Mode (RCM) and Affine Motion Compensation Prediction (AFFINE) can improve compensation accuracy, they have disadvantages of high computational complexity and hardware cost, making it difficult to meet the multiple requirements of coding efficiency, power consumption and real-time performance in high-altitude surveillance scenarios. Therefore, aiming at the core pain points of video coding for high-altitude dome cameras, it is of important academic value and practical application significance to design an optimized UMVE scheme that combines high-precision motion compensation, low computational complexity and scene adaptability, so as to improve coding efficiency and balance resource consumption. Methods This study proposes an Ultimate Motion Vector Expression technique supporting Cross-Precision Motion Compensation (UMVE_CPMC). Its core is to improve motion compensation accuracy by constructing an extended Up-Precision Motion Vector (UPMV), whose mathematical expression is UPMV = BaseMV + MMV(p, angle), where BaseMV is the basic motion vector obtained by the existing UMVE method, and MMV is the refined fine-tuning motion vector based on specific precision p and angle, with incremental candidates only provided at the 1/8 precision level to balance computational complexity and compression efficiency. For step-size adaptive adjustment, an improved scheme with six modes is proposed, covering enhanced UMVE, conventional UMVE and four precision-improved modes, allowing the encoder to switch flexibly according to scene characteristics. The average image gradient is adopted as an objective evaluation index; test scenes are divided into Class A (high-definition motion scenes) and Class B (low-definition scenes), and different coding configurations, sequences and parameters are set to compare coding gains and computational efficiency under different modes. Results and Discussions Experiments show that UMVE_CPMC achieves effective performance improvement in various scenes and modes. In Class A high-definition motion scenes, with the adaptive strategy disabled and RCM disabled, the average gains of Y, U and V components in Fusion Mode 1 reach -2.912%, -1.656% and -1.654% respectively, and the average coding time is reduced to 94.55% of the baseline; the average gain of the Y component in Independent Mode 1 reaches -2.925%, with coding time reduced to 91.91% of the baseline. Compared with traditional UMVE, when CPMC Independent Mode 1 is enabled under the scenario where RCM is enabled and other tools work collaboratively, the gain is improved from -0.276% to -1.310%, showing significantly higher cost performance. In Class B low-definition scenes, after enabling adaptive adjustment, the gain losses of Fusion Mode 1 and Mode 0 are significantly reduced, with average gain losses controlled at 0.071% and 0.108% respectively, successfully maintaining the original coding gain. In multi-scene comprehensive tests, when RCM and AFFINE are disabled, 9 out of 10 test sequences in adaptive Fusion Mode 1 show positive gains, including a Y-component gain of -10.691% for the yuxuedaolu sequence and -11.400% for the BQTerrace sequence. When all existing coding tools are enabled, the Y-component gains of dianjing, yuxuedaolu and BQTerrace sequences reach -1.29%, -2.05% and -1.21% respectively, with coding time reduced to 94%–96% of the baseline. In addition, correlation analysis between average image gradient and gain reveals a significant positive correlation: images with high average gradient (high definition) achieve greater gains from UMVE_CPMC, while those with low average gradient (low definition) hardly benefit. Principle analysis indicates that pixel changes in low-definition images are gentle, and high-precision interpolation fails to generate effective pixel values, resulting in insignificant compensation effects. Performance differences among modes match computational complexity: the fusion mode balances gain and stability, while the independent mode further reduces computation. The six step-size adaptive modes can meet real-time and precision requirements of different scenes. Conclusions The proposed UMVE_CPMC technique, by integrating cross-precision motion compensation with the UMVE algorithm, effectively solves the core problems of insufficient precision in traditional UMVE and high computational complexity of high-precision coding tools, achieving a favorable balance among coding efficiency, computational complexity and scene adaptability. This technique delivers remarkable coding gains in Class A high-definition motion scenes, with gains exceeding 10% for some sequences without other high-precision compensation tools and 1%–2% when cooperating with other tools. In Class B low-definition scenes, the original coding gain can be maintained through frame-level adaptive adjustment interfaces. Meanwhile, the fusion mode does not increase hardware complexity, and the independent mode significantly reduces coding time, suitable for encoder designs with limited resources or simplified requirements. UMVE_CPMC provides a new effective approach to solving the low coding efficiency caused by jitter and blurring in high-altitude dome camera video coding, enriches the video coding toolset, and offers important practical guidance for the optimization of video coding technologies in the security surveillance field. Future work can further optimize the adaptive strategy, explore integration with other advanced coding technologies, develop personalized coding schemes, and improve performance in complex scenarios. -
表 1 模式步长自适应调整改进方案
模式 现有方案 改进方案 步长范围 总候选数目 步长范围 总候选数目 增强UMVE [1/4, 32] 64 [1/4, 32] 64 普通UMVE [1/4, 4] 40 [1/4, 4] 40 精度提升模式(融合模式1) 不支持 不支持 [1/8, 4] 40 精度提升模式(融合模式0) 不支持 不支持 [1/8, 2] 40 精度提升模式(独立模式1) 不支持 不支持 [1/8, 1/8] 16 精度提升模式(独立模式0) 不支持 不支持 [1/8, 1/8] 8 表 2 平均梯度作为图像清晰度评价指标的有效性
序列名称 图像平均梯度 重建图像平均梯度 CPMC增益(%) input rec(q33) rec(q40) rec(q47) rec(q52) 平均 dianjing 13.6 11.6 10.7 9.7 8.8 10.2 –1.29 yuxuedaolu 23.8 20.2 18.3 15.7 13.4 16.9 –2.05 BQTerrace 13.1 8.1 7.5 7 6.6 7.3 –1.21 qiaoxialuduan1 18.3 14.6 13 11.2 9.9 12.2 –0.55 qiaoxialuduan2 10.1 6.8 6.3 5.6 5 5.9 –0.49 tingchechang 10.7 7.7 7.1 6.6 6.2 6.9 –0.23 beihaihumian 7.8 4.4 3.6 2.6 1.9 3.1 0.05 MarketPlace 6.2 3.3 2.9 2.4 2.1 2.7 0.13 Cactus 10.4 5.7 5.2 4.6 4.1 4.9 –0.09 huochezhan 9.3 5.4 4.7 4.2 3.8 4.5 0.00 lijiaoqiao 8.3 5.4 4.7 4.2 3.8 4.5 0.00 DaylightRoad 8.8 3.1 2.9 2.7 2.4 2.8 0.00 表 3 SD5.0加入CPMC的增益(融合模式1)(%)
RCM关闭 Y U V Enc_time Qiaoxia1 –1.622 –1.571 –3.208 97.59 tingchechang –1.823 –1.759 –2.229 93.75 dianjing –2.502 –1.318 –1.710 93.67 yuxuedaolu –4.910 –2.113 –0.847 92.70 BQTerrace –3.700 –1.520 –0.275 95.05 average: –2.912 –1.656 –1.654 94.55 表 4 SD5.0加入CPMC增益(融合模式0)(%)
RCM关闭 Y U V Enc_time Qiaoxia1 –1.427 –1.714 –2.705 95.17 tingchechang –1.504 –1.358 –1.870 111.30 dianjing –2.368 –1.327 –1.819 86.07 yuxuedaolu –3.822 –1.735 –0.350 94.47 BQTerrace –2.715 –2.199 –0.194 98.97 average: –2.367 –1.667 –1.388 97.20 表 5 SD5.0加入CPMC增益(独立模式1)(%)
RCM关闭 Y U V Enc_time Qiaoxia1 –1.778 –1.661 –3.277 88.84 tingchechang –1.768 –2.058 –2.170 98.69 dianjing –2.346 –1.359 –1.431 84.40 yuxuedaolu –4.894 –2.258 –1.639 91.38 BQTerrace –3.840 –0.645 0.007 96.26 average: –2.925 –1.596 –1.702 91.91 表 6 SD5.0加入CPMC增益(独立模式1)(%)
RCM开启 Y U V Enc_time Qiaoxia1 –0.479% –0.998% –1.307% 94.778% tingchechang –0.440% –0.229% 0.421% 93.640% dianjing –1.463% –0.618% –0.907% 94.938% yuxuedaolu –2.484% –1.140% –1.673% 90.148% BQTerrace –1.683% –1.065% 1.613% 90.759% average: –1.310% –0.810% –0.371% 92.853% 表 7 SD5.0加入CPMC增益(独立模式0)(%)
RCM关闭 Y U V Enc_time Qiaoxia1 –1.495 –1.103 –2.421 84.68 -tingchechang –1.532 –1.659 –2.288 101.56 dianjing –2.309 –1.789 –1.610 85.84 yuxuedaolu –3.774 –1.767 –2.438 75.24 BQTerrace –2.806 –1.156 –1.116 111.89 average: –2.383 –1.495 –1.975 91.84 表 8 SD5.0加入CPMC增益(独立模式0)(%)
RCM开启 Y U V Enc_time Qiaoxia1 –0.434 –0.909 –0.378 96.226 tingchechang –0.200 0.150 –0.004 95.931 dianjing –1.463 –0.618 –0.907 94.938 yuxuedaolu –1.887 –1.192 –1.341 93.063 BQTerrace –1.398 –2.266 –0.984 93.708 average: –1.076 –0.967 –0.723 94.773 表 9 SD5.0 加入UMVE的增益(%)
RCM开启 Y U V Qiaoxia1 –0.234 0.103 0.527 tingchechang –0.639 –1.211 –0.669 dianjing –0.315 –0.909 –0.708 yuxuedaolu –0.073 0.024 –0.538 BQTerrace –0.119 1.029 –0.680 average: –0.276 –0.193 –0.414 表 10 关闭自适应调节的CPMC增益(%)
非自适应独立模式1 非自适应独立模式0 Y U V Y U V beihaihumian 0.605 0.868 0.861 0.427 0.163 0.686 market 0.938 0.131 1.146 0.795 0.264 0.462 average: 0.772 0.499 1.004 0.611 0.214 0.574 表 11 开启自适应调节的CPMC增益(%)
自适应融合模式1 自适应融合模式0 Y U V Y U V beihaihumian –0.018 0.354 1.232 0.079 –0.043 0.481 market 0.160 0.705 0.438 0.136 –0.060 –0.361 average: 0.071 0.529 0.835 0.108 –0.051 0.060 表 12 RCM、AFFINE关闭,自适应融合模式1的增益(%)
Y U V Beihaihumian –0.422 0.718 1.017 Qiaoxia1 –2.162 –2.187 –0.884 Qiaoxia2 –0.757 –1.703 –0.543 tingchechang –2.737 –3.025 –2.741 catus –0.221 –0.304 0.087 maket 0.271 –0.215 1.309 NightTraffic3 –0.654 –1.273 –0.868 dianjing –4.131 –3.834 3.751 yuxuedaolu –10.691 –12.265 6.169 BQTerrace –11.400 –12.248 –11.008 average: –3.290 –3.634 –0.371 表 13 多场景综合测试结果(%)
V1.1 V1.2 Y U V Enc_time Y U V Enc_time 4K huochezhan 0.25 –0.56 –0.03 - 0.00 0.00 0.00 lijiaoqiao 0.28 0.41 0.72 - 0.00 0.00 0.00 DaylightRoad –0.07 –0.38 –0.53 - 0.00 0.00 0.00 1080P beihaihumian 0.01 –0.11 –0.91 - 0.05 –0.18 –0.85 100 qiaoxialuduan1 –0.54 –0.62 0.79 - –0.55 –0.98 0.11 95 qiaoxialuduan2 –0.54 –0.46 –0.41 - –0.49 –0.37 –0.72 95 tingchechang –0.26 –0.47 –0.19 - –0.23 –0.54 0.08 96 Cactus –0.08 0.25 –0.24 - –0.09 0.48 –0.37 98 MarketPlace 0.01 0.33 0.54 - 0.13 0.30 0.17 98 NightTraffic3 –0.21 –0.09 –0.24 - –0.43 –0.28 –0.53 94 补充 dianjing –1.29 –1.14 2.84 93 –1.29 –1.30 –0.76 95 yuxuedaolu –2.04 –0.40 9.22 94 –2.05 –0.35 –1.07 93 BQTerrace –1.19 –1.01 –1.57 94 –1.21 –0.56 –1.87 95 average: –0.44 –0.33 0.77 94 –0.47 –0.29 –0.45 96 1 SATD计算程序示例

2 com_mc_cu_UMVE_CPMC调用程序示例

3 com_mc_cu_UMVE_CPMC的实现逻辑

4 encode_umve_idx函数的实现逻辑

-
[1] GAO Wen and MA Siwei. Video coding optimization and application system[M]. GAO Wen and MA Siwei. Advanced Video Coding Systems. Switzerland: Springer, 2014: 161–176. doi: 10.1007/978-3-319-14243-2_9. [2] ZHU Xizhong, XIANG Guoqing, ZHANG Peng, et al. A hardware-efficient unified motion estimation for video coding[C]. Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, Canada, 2023: 9042–9050. doi: 10.1145/3581783.3613816. [3] HUANG Qian, LU Hao, LIU Wenting, et al. Scalable motion estimation and temporal context reinforcement for video compression using RGB sensors[J]. IEEE Sensors Journal, 2025, 25(10): 18323–18333. doi: 10.1109/JSEN.2025.3550525. [4] MARPE D, WIEGAND T, and SULLIVAN G J. The H. 264/MPEG4 advanced video coding standard and its applications[J]. IEEE Communications Magazine, 2006, 44(8): 134–143. doi: 10.1109/MCOM.2006.1678121. [5] 申滨, 李旋, 赖雪冰, 等. 基于Swin Transformer的宽带无线图传语义联合编解码方法[J]. 电子与信息学报, 2025, 47(8): 2665–2674. doi: 10.11999/JEIT250039.SHEN Bin, LI Xuan, LAI Xuebing, et al. Swin Transformer-based wideband wireless image transmission semantic joint encoding and decoding method[J]. Journal of Electronics & Information Technology, 2025, 47(8): 2665–2674. doi: 10.11999/JEIT250039. [6] CHIEN W J, ZHANG Li, WINKEN M, et al. Motion vector coding and block merging in the versatile video coding standard[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(10): 3848–3861. doi: 10.1109/TCSVT.2021.3101212. [7] BROSS B, WANG Yekui, YE Yan, et al. Overview of the versatile video coding (VVC) standard and its applications[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(10): 3736–3764. doi: 10.1109/TCSVT.2021.3101953. [8] KAJI S and OCHIAI H. A concise parametrization of affine transformation[J]. SIAM Journal on Imaging Sciences, 2016, 9(3): 1355–1373. doi: 10.1137/16M1056936. [9] LI Li, LI Houqiang, LIU Dong, et al. An efficient four-parameter affine motion model for video coding[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(8): 1934–1948. doi: 10.1109/TCSVT.2017.2699919. [10] MEUEL H, FERENZ S, LIU Yiqun, et al. Rate-distortion theory for affine global motion compensation in video coding[C]. 2018 25th IEEE International Conference on Image Processing, Athens, Greece, 2018: 3593–3597. doi: 10.1109/ICIP.2018.8451136. [11] VIANA R, LOOSE M, FERREIRA R, et al. A hardware-friendly acceleration of VVC affine motion estimation using decision trees[C]. 2024 37th SBC/SBMicro/IEEE Symposium on Integrated Circuits and Systems Design, Joao Pessoa, Brazil, 2024: 1–5. doi: 10.1109/SBCCI62366.2024.10703987. [12] ZHOU Chuan, LV Zhuoyi, PIAO Yinji, et al. Adaptive motion vector resolution in AVS3 Standard[C]. 2020 IEEE International Conference on Multimedia & Expo Workshops, London, UK, 2020: 1–4. doi: 10.1109/ICMEW46912.2020.9106046. [13] SULLIVAN G J, OHM J R, HAN W J, et al. Overview of the high efficiency video coding (HEVC) standard[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2012, 22(12): 1649–1668. doi: 10.1109/TCSVT.2012.2221191. [14] CHEN Shushi, HUANG Leilei, ZAN Zhao, et al. Affine motion estimation hardware implementation with 51.7%/67.5% internal bandwidth reduction for versatile video coding[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2025, 35(4): 3837–3852. doi: 10.1109/TCSVT.2024.3507375. [15] CHEN Shushi, HUANG Leilei, LIU Jiahao, et al. An error-surface-based fractional motion estimation algorithm and hardware implementation for VVC[C]. 2023 IEEE International Symposium on Circuits and Systems, Monterey, USA, 2023: 1–5. doi: 10.1109/ISCAS46773.2023.10182170. [16] ZHU Xizhong, XIANG Guoqing, HUANG Xiaofeng, et al. A hardware-friendly CTU-level IME Algorithm for VVC[C]. 2023 Data Compression Conference, Snowbird, USA, 2023: 110–119. doi: 10.1109/DCC55655.2023.00019. [17] 盛庆华, 陶泽浩, 黄小芳, 等. 一种面向AV1粗模式决策的高吞吐量硬件设计方法[J]. 电子与信息学报, 2025, 47(4): 1202–1214. doi: 10.11999/JEIT240823.SHENG Qinghua, TAO Zehao, HUANG Xiaofang, et al. A high-throughput hardware design for AV1 rough mode decision[J]. Journal of Electronics & Information Technology, 2025, 47(4): 1202–1214. doi: 10.11999/JEIT240823. [18] 宋赛, 崔昭, 詹尹僧, 等. 面向深度神经网络图像压缩的高性能算术编码硬件设计[J]. 电子与信息学报, 2025, 47(9): 3230–3240. doi: 10.11999/JEIT250509.SONG Sai, CUI Zhao, ZHAN Yinseng, et al. High-performance hardware design of arithmetic coding for deep neural network-based image compression[J]. Journal of Electronics & Information Technology, 2025, 47(9): 3230–3240. doi: 10.11999/JEIT250509. -
下载:
下载: