高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

一种面向AV1粗模式决策的高吞吐量硬件设计方法

盛庆华 陶泽浩 黄小芳 赖昌材 黄晓峰 殷海兵 董哲康

盛庆华, 陶泽浩, 黄小芳, 赖昌材, 黄晓峰, 殷海兵, 董哲康. 一种面向AV1粗模式决策的高吞吐量硬件设计方法[J]. 电子与信息学报. doi: 10.11999/JEIT240823
引用本文: 盛庆华, 陶泽浩, 黄小芳, 赖昌材, 黄晓峰, 殷海兵, 董哲康. 一种面向AV1粗模式决策的高吞吐量硬件设计方法[J]. 电子与信息学报. doi: 10.11999/JEIT240823
SHENG Qinghua, TAO Zehao, HUANG Xiaofang, LAI Changcai, HUANG Xiaofeng, YIN Haibin, DONG Zhekang. A High-Throughput Hardware Design for AV1 Rough Mode Decision[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT240823
Citation: SHENG Qinghua, TAO Zehao, HUANG Xiaofang, LAI Changcai, HUANG Xiaofeng, YIN Haibin, DONG Zhekang. A High-Throughput Hardware Design for AV1 Rough Mode Decision[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT240823

一种面向AV1粗模式决策的高吞吐量硬件设计方法

doi: 10.11999/JEIT240823
基金项目: 国家重点研发计划(2023YFB4502804)
详细信息
    作者简介:

    盛庆华:男,副教授,研究方向为视频编码、FPGA硬件加速、电子系统集成等

    陶泽浩:男,硕士生,研究方向为视频编解码、FPGA硬件加速等

    黄小芳:女,讲师,研究方向为视频编码、嵌入式应用等

    赖昌材:男,高级工程师,研究方向为图像视频压缩、智能处理及其软硬件加速实现等

    黄晓峰:男,副教授,研究方向为视频编解码与芯片架构设计等

    殷海兵:男,教授,研究方向为数字视频编解码、多媒体信号处理、芯片结构设计验证等

    董哲康:男,副教授,研究方向为忆阻器及忆阻系统、人工神经网络等

    通讯作者:

    黄小芳 20221016@hdu.edu.cn

  • 中图分类号: TN919.8

A High-Throughput Hardware Design for AV1 Rough Mode Decision

Funds: The National Key R&D Program of China (2023YFB4502804)
  • 摘要: 随着视频编码标准的不断更新迭代,开放媒体联盟(AOM)发布最新视频编码标准开放媒体视频编码标准(AV1)。其中,帧内编码技术采用更加丰富的预测模式来提高预测效率,预测种类从VP9中的10种扩展至61种。为了应对预测种类增加的变化并提高硬件的处理吞吐能力,该文提出基于全流水线结构的AV1粗模式决策硬件架构设计。在算法层面,以4×4块为最小处理单元,按照Z顺序对64×64编码树单元(CTU)中不同尺寸的预测单元(PUs)进行粗模式决策,同时采用基于1:1 PU的代价累加近似方法来完成1:2, 1:4, 2:1和4:1 PU的代价计算,以减少计算复杂度;在硬件层面,设计兼容4×4至32×32等多尺寸PU的粗模式决策电路,取代为不同尺寸PU单独设计电路的方法,有效减少逻辑资源的闲置。实验结果表明,在全帧内(AI)配置下,提出的改进算法相较于AV1标准算法平均节省了45.78%的时间,提高了1.94% BD-Rate。同时,提出的硬件架构设计能够在1057个时钟周期内完成64×64 CTU的粗模式决策,使用Synopsys公司的Design Compiler 2016工具及UMC 28 nm工艺库对硬件设计综合得到,该设计能够在432.7 MHz工作频率下实时处理8k@50.6fps的视频。
  • 图  1  RMD硬件总体架构设计

    图  2  硬件实现RMD流程图

    图  3  整体架构时空图

    图  4  4×4 PU参考像素填充情况

    图  5  输入顺序示意图

    图  6  方向性模式硬件设计

    图  7  DC模式硬件设计

    图  8  平滑模式硬件设计

    图  9  平滑模式权重PMCM硬件设计

    图  10  Paeth模式硬件设计

    图  11  4×4 PU的SATD代价计算硬件设计

    图  12  长度为8的乱序列双调排序示例

    图  13  输入序列长度为8的双调排序硬件设计

    表  1  改进算法与AV1标准算法的性能比较(%)

    测试序列BD-RateTS
    A1(UHD 4K)2.2149.2
    A2(UHD 4K)1.7746.4
    B(1080P)1.9348.1
    C(480P)2.2338.4
    E(720P)1.5646.8
    平均结果1.9445.78
    下载: 导出CSV

    表  2  本文改进算法与现有工作比较(%)

    文献 BD-Rate TS
    [33] 1.28 29.80
    [34] 7.41 50.19
    [35] 0.60 15.36
    本文 1.94 45.78
    下载: 导出CSV

    表  3  基于ASIC实现的RMD相关硬件设计工作对比

    对比指标 文献[36] 文献[37] 文献[38] 文献[39] 本文
    工艺 TSMC 40 nm TSMC 40 nm TSMC 40 nm TSMC 40 nm UMC 28 nm
    门电路(Kgates) 455.8 821.8 584.8 128.5 1011.3
    工作频率(MHz) 1,296 1,902 1,296 648 432.7
    时钟周期(Cycle) 7104 7104 7104 7104 1057
    功耗(mW) 40.9 1613.3 4110.0 65.5 1891.6
    吞吐量 4k@60fps 4k@60fps 4k@60fps 4k@30fps 8k@50.6fps
    吞吐量/面积(px/gate) 1091.85 605.55 850.93 1936.44 1660.03
    非方向性预测 × × ×
    方向性预测 ×
    模式决策 × × × ×
    下载: 导出CSV
  • [1] BENDER I, BORGES A, AGOSTINI L, et al. Complexity and compression efficiency analysis of libaom AV1 video codec[J]. Journal of Real-Time Image Processing, 2023, 20(3): 50. doi: 10.1007/s11554-023-01308-5.
    [2] REN Huiwen, WANG Shanshe, MA Siwei, et al. SVT-AVS3: An open-source high-performance AVS3 encoder with scalable video technology[J]. IEEE Transactions on Multimedia, 2024, 26: 3291–3301. doi: 10.1109/TMM.2023.3309549.
    [3] LEE M, SONG H J, PARK J, et al. Overview of versatile video coding (H. 266/VVC) and its coding performance analysis[J]. IEIE Transactions on Smart Processing & Computing, 2023, 12(2): 122–154. doi: 10.5573/IEIESPC.2023.12.2.122.
    [4] MUKHERJEE D, HAN Jingning, BANKOSKI J, et al. A technical overview of VP9—the latest open-source video codec[J]. SMPTE Motion Imaging Journal, 2015, 124(1): 44–54. doi: 10.5594/j18499.
    [5] 林浩, 饶丰. AV1视频编码标准在我国的发展趋势分析[J]. 广播电视信息, 2023, 30(2): 62–64. doi: 10.16045/j.cnki.rti.2023.02.022.

    LIN Hao and RAO Feng. Analysis on the development trend of AV1 video coding standard in China[J]. Radio & Television Information, 2023, 30(2): 62–64. doi: 10.16045/j.cnki.rti.2023.02.022.
    [6] 杜红青. 下一代视频编码高效帧内预测算法研究[D]. [硕士论文], 西安电子科技大学, 2023. doi: 10.27389/d.cnki.gxadu.2023.001917.

    DU Hongqing. Research on high efficiency intra prediction algorithm for next generation video coding[D]. [Master dissertation], Xidian University, 2023. doi: 10.27389/d.cnki.gxadu.2023.001917.
    [7] GROIS D, GILADI A, CHOI K, et al. Performance comparison of emerging EVC and VVC video coding standards with HEVC and AV1[J]. SMPTE Motion Imaging Journal, 2021, 130(4): 1–12. doi: 10.5594/JMI.2021.3065442.
    [8] UHRINA M, SEVCIK L, BIENIK J, et al. Performance comparison of VVC, AV1, HEVC, and AVC for high resolutions[J]. Electronics, 2024, 13(5): 953. doi: 10.3390/electronics13050953.
    [9] 刘畅, 贾克斌, 刘鹏宇. 基于多分支网络的深度图帧内编码单元快速划分算法[J]. 电子与信息学报, 2022, 44(12): 4357–4366. doi: 10.11999/JEIT211010.

    LIU Chang, JIA Kebin, and LIU Pengyu. Fast partition algorithm in depth map intra-frame coding unit based on multi-branch network[J]. Journal of Electronics & Information Technology, 2022, 44(12): 4357–4366. doi: 10.11999/JEIT211010.
    [10] WANG Yizhao, ZHANG Chaobo, and SUN Songlin. Intra prediction fast algorithm in AVS3 based on image texture characteristics[C]. 2021 20th International Symposium on Communications and Information Technologies, Tottori, Japan, 2021: 6–10. doi: 10.1109/ISCIT52804.2021.9590620.
    [11] ZHANG Yongfei, LI Zhe, and LI Bo, et al. Gradient-based fast decision for intra prediction in HEVC[C]. 2012 Visual Communications and Image Processing, San Diego, USA, 2012: 1–6. doi: 10.1109/VCIP.2012.6410739.
    [12] ZHU Linwei, ZHANG Yun, Li Na, et al. Deep learning-based intra mode derivation for versatile video coding[J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2023, 19(2s): 96. doi: 10.1145/356369.
    [13] DUARTE A, ZATT B, CORREA G, et al. Fast intra mode decision using machine learning for the versatile video coding standard[C]. 2023 IEEE International Symposium on Circuits and Systems, Monterey, USA, 2023: 1–5. doi: 10.1109/ISCAS46773.2023.10181769.
    [14] STORCH I, ROMA N, PALOMINO D, et al. GPU acceleration of MIP intra prediction in VVC[C]. 2023 31st European Signal Processing Conference, Helsinki, Finland, 2023: 600–604. doi: 10.23919/EUSIPCO58844.2023.10290037.
    [15] HAN Xu, WANG Shanshe, MA Siwei, et al. Optimization of motion compensation based on GPU and CPU for VVC decoding[C]. 2020 IEEE International Conference on Image Processing, Abu Dhabi, United Arab Emirates, 2020: 1196–1200. doi: 10.1109/ICIP40778.2020.9190708.
    [16] CORRÊA M, WASKOW B, ZATT B, et al. High throughput hardware design for AV1 Paeth and smooth intra modes[C]. 2019 IEEE International Symposium on Circuits and Systems, Sapporo, Japan, 2019: 1–5. doi: 10.1109/ISCAS.2019.8702258.
    [17] CAI Zhanyuan and GAO Wei. Efficient fast algorithm and parallel hardware architecture for intra prediction of AVS3[C]. 2021 IEEE International Symposium on Circuits and Systems, Daegu, South Korea, 2021: 1–5. doi: 10.1109/ISCAS51556.2021.9401121.
    [18] HUANG Xiaofeng, JIA Huizhu, CAI Binbin, et al. Fast algorithms and VLSI architecture design for HEVC intra-mode decision[J]. Journal of Real-Time Image Processing, 2016, 12(2): 285–302. doi: 10.1007/s11554-015-0549-8.
    [19] CORRÊA M, WASKOW B, GOEBEL J, et al. A high throughput hardware architecture targeting the AV1 Paeth intra predictor[C]. 2019 IEEE 10th Latin American Symposium on Circuits & System, Armenia, Colombia, 2019: 93–96. doi: 10.1109/LASCAS.2019.8667544.
    [20] 刘鹏宇, 张悦, 贾克斌, 等. 基于局部亮度直方图的自适应视频帧类型决策算法[J]. 电子与信息学报, 2023, 45(1): 300–307. doi: 10.11999/JEIT211199.

    LIU Pengyu, ZHANG Yue, JIA Kebin, et al. Adaptive video frame type decision algorithm based on local luminance histogram[J]. Journal of Electronics & Information Technology, 2023, 45(1): 300–307. doi: 10.11999/JEIT211199.
    [21] SU Weitong, XIANG Guoqing, HUANG Xiaofeng, et al. Fast algorithm and VLSI architecture design of rough mode decision for AVS3[C]. 2023 IEEE International Conference on Consumer Electronic, Las Vegas, USA, 2023: 1–4. doi: 10.1109/ICCE56470.2023.10043565.
    [22] 齐美彬, 陈秀丽, 杨艳芳, 等. 高效率视频编码帧内预测编码单元划分快速算法[J]. 电子与信息学报, 2014, 36(7): 1699–1705. doi: 10.3724/SP.J.1146.2013.01148.

    QI Meibin, CHEN Xiuli, and YANG Yanfang. Fast coding unit splitting algorithm for high efficiency video coding intra prediction[J]. Journal of Electronics & Information Technology, 2014, 36(7): 1699–1705. doi: 10.3724/SP.J.1146.2013.01148.
    [23] CHEN Yue, MUKHERJEE D, HAN Jingning, et al. An overview of coding tools in AV1: The first video codec from the alliance for open media[J]. APSIPA Transactions on Signal and Information Processing, 2020, 9(1): e6. doi: 10.1017/ATSIP.2020.2.
    [24] HAKKENNES E A and VASSILIADIS S. Hardwired Paeth codec for portable network graphics (PNG)[C]. Proceedings 25th EUROMICRO Conference. Informatics: Theory and Practice for the New Millennium, Milan, Italy, 1999: 318–325. doi: 10.1109/EURMIC.1999.794796.
    [25] PAETH A W. Image file compression made easy[M]. ARVO J. Graphics Gems II. Amsterdam: Elsevier, 1991: 93–100. doi: 10.1016/B978-0-08-050754-5.50029-3.
    [26] STORCH I, ROMA N, PALOMINO D, et al. Alternative reference samples to improve coding efficiency for parallel intra prediction solutions[C]. 2024 IEEE 15th Latin America Symposium on Circuits and Systems, Punta del Este, Uruguay, 2024: 1–5. doi: 10.1109/LASCAS60203.2024.10506142.
    [27] KUMM M. Multiple Constant Multiplication Optimizations for Field Programmable Gate Arrays[M]. Wiesbaden: Springer, 2016. doi: 10.1007/978-3-658-13323-8.
    [28] LIACHA A, OUDJIDA A K, BAKIRI M, et al. Radix-2r recoding with common subexpression elimination for multiple constant multiplication[J]. IET Circuits, Devices & Systems, 2020, 14(7): 990–994. doi: 10.1049/iet-cds.2020.0213.
    [29] MOHAMED H, ELLIETHY A, ABDELAZIZ A, et al. Real-time motion estimation based video steganography with preserved consistency and local optimality[J]. Multimedia Tools and Applications, 2024: 1–24. doi: 10.1007/s11042-024-18651-9.
    [30] CHEN Shushi, HUANG Leilei, LIU Jiahao, et al. An error-surface-based fractional motion estimation algorithm and hardware implementation for VVC[C]. 2023 IEEE International Symposium on Circuits and Systems, Monterey, USA, 2023: 1–5. doi: 10.1109/ISCAS46773.2023.10182170.
    [31] YANG Mouzhi, ZHANG Peng, FANG Jianbin, et al. thSORT: An efficient parallel sorting algorithm on multi-core DSPs[J]. CCF Transactions on High Performance Computing, 2024, 6(5): 503–518. doi: 10.1007/s42514-023-00175-7.
    [32] ESMAILI-DOKHT P, GUIOT M, RADOJKOVIĆ P, et al. O(n) key–value sort with active compute memory[J]. IEEE Transactions on Computers, 2024, 73(5): 1341–1356. doi: 10.1109/TC.2024.3371773.
    [33] CORRÊA M M. Heuristic-based algorithms and hardware designs for fast intra-picture prediction in AV1 video coding[D]. [Ph. D. dissertation], Universidade Federal de Pelotas, 2023.
    [34] ROSA P, PALOMINO D, PORTO M, et al. GM-RF: An AV1 intra-frame fast decision based on random forest[C]. 2022 IEEE International Conference on Image Processing, Bordeaux, France, 2022: 3556–3560. doi: 10.1109/ICIP46576.2022.9897488.
    [35] CORRÊA M, ROMA N, PALOMINO D, et al. Mode-adaptive subsampling of SAD/SSE operations for intra prediction cost reduction[C]. 2022 IEEE International Symposium on Circuits and Systems, Austin, USA, 2022: 1808–1812. doi: 10.1109/ISCAS48785.2022.9937507.
    [36] CORRĚA M, NETO L, PALOMINO D, et al. ASIC solution for the directional intra prediction of the AV1 encoder targeting UHD 4K videos[C]. 2020 IEEE International Symposium on Circuits and Systems, Seville, Spain, 2020: 1–5. doi: 10.1109/ISCAS45731.2020.9180526.
    [37] NETO L, CORRÊA M, PALOMINO D, et al. Directional intra frame prediction architecture with edge filter and upsampling for AV1 video coding[C]. 2020 33rd Symposium on Integrated Circuits and Systems Design, Campinas, Brazil, 2020: 1–6. doi: 10.1109/SBCCI50935.2020.9189902.
    [38] NETO L, CORREA M, PALOMINO D, et al. Exploring operation sharing in directional intra frame prediction of AV1 video coding[C]. 2021 IEEE 12th Latin America Symposium on Circuits and System, Arequipa, Peru, 2021: 1–4. doi: 10.1109/LASCAS51355.2021.9459136.
    [39] CORRÊA M M, WASKOW B H, GOEBEL J W, et al. A high-throughput hardware architecture for AV1 non-directional intra modes[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2020, 67(5): 1481–1494. doi: 10.1109/TCSI.2020.2973031.
  • 加载中
图(13) / 表(3)
计量
  • 文章访问数:  77
  • HTML全文浏览量:  19
  • PDF下载量:  6
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-09-27
  • 修回日期:  2025-01-02
  • 网络出版日期:  2025-01-09

目录

    /

    返回文章
    返回