高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

一种面向AV1粗模式决策的高吞吐量硬件设计方法

盛庆华 陶泽浩 黄小芳 赖昌材 黄晓峰 殷海兵 董哲康

盛庆华, 陶泽浩, 黄小芳, 赖昌材, 黄晓峰, 殷海兵, 董哲康. 一种面向AV1粗模式决策的高吞吐量硬件设计方法[J]. 电子与信息学报. doi: 10.11999/JEIT240823
引用本文: 盛庆华, 陶泽浩, 黄小芳, 赖昌材, 黄晓峰, 殷海兵, 董哲康. 一种面向AV1粗模式决策的高吞吐量硬件设计方法[J]. 电子与信息学报. doi: 10.11999/JEIT240823
SHENG Qinghua, TAO Zehao, HUANG Xiaofang, LAI Changcai, HUANG Xiaofeng, YIN Haibin, DONG Zhekang. A High-Throughput Hardware Design for AV1 Rough Mode Decision[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT240823
Citation: SHENG Qinghua, TAO Zehao, HUANG Xiaofang, LAI Changcai, HUANG Xiaofeng, YIN Haibin, DONG Zhekang. A High-Throughput Hardware Design for AV1 Rough Mode Decision[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT240823

一种面向AV1粗模式决策的高吞吐量硬件设计方法

doi: 10.11999/JEIT240823
基金项目: 国家重点研发计划(2023YFB4502804)
详细信息
    作者简介:

    盛庆华:男,副教授,研究方向为视频编码、FPGA硬件加速、电子系统集成等

    陶泽浩:男,硕士生,研究方向为视频编解码、FPGA硬件加速等

    黄小芳:女,讲师,研究方向为视频编码、嵌入式应用等

    赖昌材:男,高级工程师,研究方向为图像视频压缩、智能处理及其软硬件加速实现等

    黄晓峰:男,副教授,研究方向为视频编解码与芯片架构设计等

    殷海兵:男,教授,研究方向为数字视频编解码、多媒体信号处理、芯片结构设计验证等

    董哲康:男,副教授,研究方向为忆阻器及忆阻系统、人工神经网络等

    通讯作者:

    黄小芳 20221016@hdu.edu.cn

  • 中图分类号: TN919.8

A High-Throughput Hardware Design for AV1 Rough Mode Decision

Funds: The National Key R&D Program of China (2023YFB4502804)
  • 摘要: 随着视频编码标准的不断更新迭代,开放媒体联盟(AOM)发布最新视频编码标准开放媒体视频编码标准(AV1)。其中,帧内编码技术采用更加丰富的预测模式来提高预测效率,预测种类从VP9中的10种扩展至61种。为了应对预测种类增加的变化并提高硬件的处理吞吐能力,该文提出基于全流水线结构的AV1粗模式决策硬件架构设计。在算法层面,以4×4块为最小处理单元,按照Z顺序对64×64编码树单元(CTU)中不同尺寸的预测单元(PUs)进行粗模式决策,同时采用基于1:1 PU的代价累加近似方法来完成1:2, 1:4, 2:1和4:1 PU的代价计算,以减少计算复杂度;在硬件层面,设计兼容4×4至32×32等多尺寸PU的粗模式决策电路,取代为不同尺寸PU单独设计电路的方法,有效减少逻辑资源的闲置。实验结果表明,在全帧内(AI)配置下,提出的改进算法相较于AV1标准算法平均节省了45.78%的时间,提高了1.94% BD-Rate。同时,提出的硬件架构设计能够在1057个时钟周期内完成64×64 CTU的粗模式决策,使用Synopsys公司的Design Compiler 2016工具及UMC 28 nm工艺库对硬件设计综合得到,该设计能够在432.7 MHz工作频率下实时处理8k@50.6fps的视频。
  • 图  1  RMD硬件总体架构设计

    图  2  硬件实现RMD流程图

    图  3  整体架构时空图

    图  4  4×4 PU参考像素填充情况

    图  5  输入顺序示意图

    图  6  方向性模式硬件设计

    图  7  DC模式硬件设计

    图  8  平滑模式硬件设计

    图  9  平滑模式权重PMCM硬件设计

    图  10  paeth模式硬件设计

    图  11  4×4 PU的SATD代价计算硬件设计

    图  12  长度为8的乱序列双调排序示例

    图  13  输入序列长度为8的双调排序硬件设计

    表  1  改进算法与AV1标准算法的性能比较(%)

    测试序列BD-RateTS
    A1(UHD 4K)2.2149.2
    A2(UHD 4K)1.7746.4
    B(1080P)1.9348.1
    C(480P)2.2338.4
    E(720P)1.5646.8
    平均结果1.9445.78
    下载: 导出CSV

    表  2  改进算法与现有工作比较(%)

    现有工作BD-RateTS
    现有工作[33]1.2829.80
    现有工作[34]7.4150.19
    现有工作[35]0.6015.36
    本文研究1.9445.78
    下载: 导出CSV

    表  3  基于ASIC实现的RMD相关硬件设计工作对比

    对比指标现有工作[36]现有工作[37]现有工作[38]现有工作[39]本文研究
    工艺TSMC 40 nmTSMC 40 nmTSMC 40 nmTSMC 40 nmUMC 28 nm
    门电路(Kgates)455.8821.8584.8128.51011.3
    工作频率(MHz)1,2961,9021,296648432.7
    时钟周期(Cycle)71047104710471041057
    功耗(mW)40.91613.34110.065.51891.6
    吞吐量4k@60fps4k@60fps4k@60fps4k@30fps8k@50.6fps
    吞吐量/面积(px/gate)1091.85605.55850.931936.441660.03
    非方向性预测×××
    方向性预测×
    模式决策××××
    下载: 导出CSV
  • [1] BENDER I, BORGES A, AGOSTINI L, et al. Complexity and compression efficiency analysis of libaom AV1 video codec[J]. Journal of Real-Time Image Processing, 2023, 20(3): 50. doi: 10.1007/s11554-023-01308-5.
    [2] REN Huiwen, WANG Shanshe, MA Siwei, et al. SVT-AVS3: An open-source high-performance AVS3 encoder with scalable video technology[J]. IEEE Transactions on Multimedia, 2024, 26: 3291–3301. doi: 10.1109/TMM.2023.3309549.
    [3] LEE M, SONG H J, PARK J, et al. Overview of versatile video coding (H. 266/VVC) and its coding performance analysis[J]. IEIE Transactions on Smart Processing & Computing, 2023, 12(2): 122–154. doi: 10.5573/IEIESPC.2023.12.2.122.
    [4] MUKHERJEE D, HAN Jingning, BANKOSKI J, et al. A technical overview of VP9—the latest open-source video codec[J]. SMPTE Motion Imaging Journal, 2015, 124(1): 44–54. doi: 10.5594/j18499.
    [5] 林浩, 饶丰. AV1视频编码标准在我国的发展趋势分析[J]. 广播电视信息, 2023, 30(2): 62–64. doi: 10.16045/j.cnki.rti.2023.02.022.

    LIN Hao and RAO Feng. Analysis on the development trend of AV1 video coding standard in China[J]. Radio & Television Information, 2023, 30(2): 62–64. doi: 10.16045/j.cnki.rti.2023.02.022. (查阅网上资料,未找到本条文献英文翻译信息,请确认) .
    [6] 杜红青. 下一代视频编码高效帧内预测算法研究[D]. [硕士论文], 西安电子科技大学, 2023. doi: 10.27389/d.cnki.gxadu.2023.001917.

    DU Hongqing. Research on high efficiency intra prediction algorithm for next generation video coding[D]. [Master dissertation], Xidian University, 2023. doi: 10.27389/d.cnki.gxadu.2023.001917.
    [7] GROIS D, GILADI A, CHOI K, et al. Performance comparison of emerging EVC and VVC video coding standards with HEVC and AV1[J]. SMPTE Motion Imaging Journal, 2021, 130(4): 1–12. doi: 10.5594/JMI.2021.3065442.
    [8] UHRINA M, SEVCIK L, BIENIK J, et al. Performance comparison of VVC, AV1, HEVC, and AVC for high resolutions[J]. Electronics, 2024, 13(5): 953. doi: 10.3390/electronics13050953.
    [9] 刘畅, 贾克斌, 刘鹏宇. 基于多分支网络的深度图帧内编码单元快速划分算法[J]. 电子与信息学报, 2022, 44(12): 4357–4366. doi: 10.11999/JEIT211010.

    LIU Chang, JIA Kebin, and LIU Pengyu. Fast partition algorithm in depth map intra-frame coding unit based on multi-branch network[J]. Journal of Electronics & Information Technology, 2022, 44(12): 4357–4366. doi: 10.11999/JEIT211010.
    [10] WANG Yizhao, ZHANG Chaobo, and SUN Songlin. Intra prediction fast algorithm in AVS3 based on image texture characteristics[C]. 2021 20th International Symposium on Communications and Information Technologies, Tottori, Japan, 2021: 6–10. doi: 10.1109/ISCIT52804.2021.9590620.
    [11] ZHANG Yongfei, LI Zhe, and LI Bo, et al. Gradient-based fast decision for intra prediction in HEVC[C]. 2012 Visual Communications and Image Processing, San Diego, USA, 2012: 1–6. doi: 10.1109/VCIP.2012.6410739.
    [12] ZHU Linwei, ZHANG Yun, Li Na, et al. Deep learning-based intra mode derivation for versatile video coding[J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2023, 19(2s): 96. doi: 10.1145/356369.
    [13] DUARTE A, ZATT B, CORREA G, et al. Fast intra mode decision using machine learning for the versatile video coding standard[C]. 2023 IEEE International Symposium on Circuits and Systems, Monterey, USA, 2023: 1–5. doi: 10.1109/ISCAS46773.2023.10181769.
    [14] STORCH I, ROMA N, PALOMINO D, et al. GPU acceleration of MIP intra prediction in VVC[C]. 2023 31st European Signal Processing Conference, Helsinki, Finland, 2023: 600–604. doi: 10.23919/EUSIPCO58844.2023.10290037.
    [15] HAN Xu, WANG Shanshe, MA Siwei, et al. Optimization of motion compensation based on GPU and CPU for VVC decoding[C]. 2020 IEEE International Conference on Image Processing, Abu Dhabi, United Arab Emirates, 2020: 1196–1200. doi: 10.1109/ICIP40778.2020.9190708.
    [16] CORRÊA M, WASKOW B, ZATT B, et al. High throughput hardware design for AV1 Paeth and smooth intra modes[C]. 2019 IEEE International Symposium on Circuits and Systems, Sapporo, Japan, 2019: 1–5. doi: 10.1109/ISCAS.2019.8702258.
    [17] CAI Zhanyuan and GAO Wei. Efficient fast algorithm and parallel hardware architecture for intra prediction of AVS3[C]. 2021 IEEE International Symposium on Circuits and Systems, Daegu, South Korea, 2021: 1–5. doi: 10.1109/ISCAS51556.2021.9401121.
    [18] HUANG Xiaofeng, JIA Huizhu, CAI Binbin, et al. Fast algorithms and VLSI architecture design for HEVC intra-mode decision[J]. Journal of Real-Time Image Processing, 2016, 12(2): 285–302. doi: 10.1007/s11554-015-0549-8.
    [19] CORRÊA M, WASKOW B, GOEBEL J, et al. A high throughput hardware architecture targeting the AV1 Paeth intra predictor[C]. 2019 IEEE 10th Latin American Symposium on Circuits & System, Armenia, Colombia, 2019: 93–96. doi: 10.1109/LASCAS.2019.8667544.
    [20] 刘鹏宇, 张悦, 贾克斌, 等. 基于局部亮度直方图的自适应视频帧类型决策算法[J]. 电子与信息学报, 2023, 45(1): 300–307. doi: 10.11999/JEIT211199.

    LIU Pengyu, ZHANG Yue, JIA Kebin, et al. Adaptive video frame type decision algorithm based on local luminance histogram[J]. Journal of Electronics & Information Technology, 2023, 45(1): 300–307. doi: 10.11999/JEIT211199.
    [21] SU Weitong, XIANG Guoqing, HUANG Xiaofeng, et al. Fast algorithm and VLSI architecture design of rough mode decision for AVS3[C]. 2023 IEEE International Conference on Consumer Electronic, Las Vegas, USA, 2023: 1–4. doi: 10.1109/ICCE56470.2023.10043565.
    [22] 齐美彬, 陈秀丽, 杨艳芳, 等. 高效率视频编码帧内预测编码单元划分快速算法[J]. 电子与信息学报, 2014, 36(7): 1699–1705. doi: 10.3724/SP.J.1146.2013.01148.

    QI Meibin, CHEN Xiuli, and YANG Yanfang. Fast coding unit splitting algorithm for high efficiency video coding intra prediction[J]. Journal of Electronics & Information Technology, 2014, 36(7): 1699–1705. doi: 10.3724/SP.J.1146.2013.01148.
    [23] CHEN Yue, MUKHERJEE D, HAN Jingning, et al. An overview of coding tools in AV1: The first video codec from the alliance for open media[J]. APSIPA Transactions on Signal and Information Processing, 2020, 9(1): e6. doi: 10.1017/ATSIP.2020.2.
    [24] HAKKENNES E A and VASSILIADIS S. Hardwired Paeth codec for portable network graphics (PNG)[C]. Proceedings 25th EUROMICRO Conference. Informatics: Theory and Practice for the New Millennium, Milan, Italy, 1999: 318–325. doi: 10.1109/EURMIC.1999.794796.
    [25] PAETH A W. Image file compression made easy[M]. ARVO J. Graphics Gems II. Amsterdam: Elsevier, 1991: 93–100. doi: 10.1016/B978-0-08-050754-5.50029-3.
    [26] STORCH I, ROMA N, PALOMINO D, et al. Alternative reference samples to improve coding efficiency for parallel intra prediction solutions[C]. 2024 IEEE 15th Latin America Symposium on Circuits and Systems, Punta del Este, Uruguay, 2024: 1–5. doi: 10.1109/LASCAS60203.2024.10506142.
    [27] KUMM M. Multiple Constant Multiplication Optimizations for Field Programmable Gate Arrays[M]. Wiesbaden: Springer, 2016. doi: 10.1007/978-3-658-13323-8.
    [28] LIACHA A, OUDJIDA A K, BAKIRI M, et al. Radix-2r recoding with common subexpression elimination for multiple constant multiplication[J]. IET Circuits, Devices & Systems, 2020, 14(7): 990–994. doi: 10.1049/iet-cds.2020.0213.
    [29] MOHAMED H, ELLIETHY A, ABDELAZIZ A, et al. Real-time motion estimation based video steganography with preserved consistency and local optimality[J]. Multimedia Tools and Applications, 2024: 1–24. doi: 10.1007/s11042-024-18651-9. (查阅网上资料,未找到本条文献卷期信息,请确认) .
    [30] CHEN Shushi, HUANG Leilei, LIU Jiahao, et al. An error-surface-based fractional motion estimation algorithm and hardware implementation for VVC[C]. 2023 IEEE International Symposium on Circuits and Systems, Monterey, USA, 2023: 1–5. doi: 10.1109/ISCAS46773.2023.10182170.
    [31] YANG Mouzhi, ZHANG Peng, FANG Jianbin, et al. thSORT: An efficient parallel sorting algorithm on multi-core DSPs[J]. CCF Transactions on High Performance Computing, 2024, 6(5): 503–518. doi: 10.1007/s42514-023-00175-7.
    [32] ESMAILI-DOKHT P, GUIOT M, RADOJKOVIĆ P, et al. O(n) key–value sort with active compute memory[J]. IEEE Transactions on Computers, 2024, 73(5): 1341–1356. doi: 10.1109/TC.2024.3371773.
    [33] CORRÊA M M. Heuristic-based algorithms and hardware designs for fast intra-picture prediction in AV1 video coding[D]. [Ph. D. dissertation], Universidade Federal de Pelotas, 2023.
    [34] ROSA P, PALOMINO D, PORTO M, et al. GM-RF: An AV1 intra-frame fast decision based on random forest[C]. 2022 IEEE International Conference on Image Processing, Bordeaux, France, 2022: 3556–3560. doi: 10.1109/ICIP46576.2022.9897488.
    [35] CORRÊA M, ROMA N, PALOMINO D, et al. Mode-adaptive subsampling of SAD/SSE operations for intra prediction cost reduction[C]. 2022 IEEE International Symposium on Circuits and Systems, Austin, USA, 2022: 1808–1812. doi: 10.1109/ISCAS48785.2022.9937507.
    [36] CORRĚA M, NETO L, PALOMINO D, et al. ASIC solution for the directional intra prediction of the AV1 encoder targeting UHD 4K videos[C]. 2020 IEEE International Symposium on Circuits and Systems, Seville, Spain, 2020: 1–5. doi: 10.1109/ISCAS45731.2020.9180526.
    [37] NETO L, CORRÊA M, PALOMINO D, et al. Directional intra frame prediction architecture with edge filter and upsampling for AV1 video coding[C]. 2020 33rd Symposium on Integrated Circuits and Systems Design, Campinas, Brazil, 2020: 1–6. doi: 10.1109/SBCCI50935.2020.9189902.
    [38] NETO L, CORREA M, PALOMINO D, et al. Exploring operation sharing in directional intra frame prediction of AV1 video coding[C]. 2021 IEEE 12th Latin America Symposium on Circuits and System, Arequipa, Peru, 2021: 1–4. doi: 10.1109/LASCAS51355.2021.9459136.
    [39] CORRÊA M M, WASKOW B H, GOEBEL J W, et al. A high-throughput hardware architecture for AV1 non-directional intra modes[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2020, 67(5): 1481–1494. doi: 10.1109/TCSI.2020.2973031.
  • 加载中
图(13) / 表(3)
计量
  • 文章访问数:  37
  • HTML全文浏览量:  10
  • PDF下载量:  1
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-09-27
  • 修回日期:  2025-01-02
  • 网络出版日期:  2025-01-09

目录

    /

    返回文章
    返回