| Citation: | ZENG Huanqiang, KONG Qingwei, CHEN Jing, ZHU Jianqing, SHI Yifan, HOU Junhui. Overview of Immersive Video Coding[J]. Journal of Electronics & Information Technology, 2024, 46(2): 602-614. doi: 10.11999/JEIT230097 | 
 
	                | [1] | BOYCE J M, DORÉ R, DZIEMBOWSKI A,  et al. MPEG immersive video coding standard[J]. Proceedings of the IEEE, 2021, 109(9): 1521–1536. doi:  10.1109/JPROC.2021.3062590. | 
| [2] | IEEE. 1857.9-2021 IEEE standard for immersive visual content coding[S]. New York: The Institute of Electrical and Electronics Engineers, 2022. doi:  10.1109/IEEESTD.2022.9726138. | 
| [3] | CHEN Zhenzhong, LI Yiming, and ZHANG Yingxue. Recent advances in omnidirectional video coding for virtual reality: Projection and evaluation[J]. Signal Processing, 2018, 146: 66–78. doi:  10.1016/j.sigpro.2018.01.004. | 
| [4] | 叶成英, 李建微, 陈思喜. VR全景视频传输研究进展[J]. 计算机应用研究, 2022, 39(6): 1601–1607,1621. doi:  10.19734/j.issn.1001-3695.2021.11.0623. YE Chengying, LI Jianwei, and CHEN Sixi. Research progress of VR panoramic video transmission[J]. Application Research of Computers, 2022, 39(6): 1601–1607,1621. doi:  10.19734/j.issn.1001-3695.2021.11.0623. | 
| [5] | YU M, LAKSHMAN H, and GIROD B. Content adaptive representations of omnidirectional videos for cinematic virtual reality[C]. The 3rd International Workshop on Immersive Media Experiences, Brisbane, Australia, 2015: 1–6. doi:  10.1145/2814347.2814348. | 
| [6] | LI Jisheng, WEN Ziyu, LI Sihan,    et al. Novel tile segmentation scheme for omnidirectional video[C]. 2016 IEEE International Conference on Image Processing, Phoenix, USA, 2016: 370–374. doi:  10.1109/ICIP.2016.7532381. | 
| [7] | ZHANG C, LU Y, LI J,    et al. AhG8: Segmented sphere projection (SSP) for 360-degree video content[C]. Joint Video Exploration Team of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, 4th Meeting, Geneva, Switzerland, 2016. | 
| [8] | 兰诚栋, 饶迎节, 宋彩霞, 等. 基于强化学习的立体全景视频自适应流[J]. 电子与信息学报, 2022, 44(4): 1461–1468. doi:  10.11999/JEIT200908. LAN Chengdong, RAO Yingjie, SONG Caixia,  et al. Adaptive streaming of stereoscopic panoramic video based on reinforcement learning[J]. Journal of Electronics & Information Technology, 2022, 44(4): 1461–1468. doi:  10.11999/JEIT200908. | 
| [9] | GREENE N. Environment mapping and other applications of world projections[J]. IEEE Computer Graphics and Applications, 1986, 6(11): 21–29. doi:  10.1109/MCG.1986.276658. | 
| [10] | FU C W, WAN Liang, WONG T T,  et al. The rhombic dodecahedron map: An efficient scheme for encoding panoramic video[J]. IEEE Transactions on Multimedia, 2009, 11(4): 634–644. doi:  10.1109/TMM.2009.2017626. | 
| [11] | LIN H C, LI C Y, LIN Jianliang,    et al. AHG8: An efficient compact layout for octahedron format[C]. Joint Video Exploration Team of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, Chengdu, China, 2016. | 
| [12] | LIN H C, HUANG C C, LI C Y,    et al. AHG8: An improvement on the compact OHP layout[C]. Joint Video Exploration Team of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, 5th Meeting, Geneva, Switzerland, 2017. | 
| [13] | AKULA S N, SINGH A, KK R,    et al. AHG8: Efficient frame packing method for icosahedral projection (ISP)[C]. Joint Video Exploration of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11 7th Meeting, Torino, Italy, 2017: JVET-G0156. | 
| [14] | COBAN M, AUWERA G V D, and KARCZEWICZ M. AHG8: Adjusted cubemap projection for 360-degree video[C]. Joint Video Exploration Team of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11 6th Meeting, Hobart, Australia, 2017: JVET-F0025. | 
| [15] | ZHOU M. AHG8: A study on equi-angular cubemap projection (EAC)[C]. Joint Video Exploration Team of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, Geneva, Switzerland, 2017. | 
| [16] | LIN Jianliang, LEE Y H, SHIH C H,  et al. Efficient projection and coding tools for 360° video[J]. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2019, 9(1): 84–97. doi:  10.1109/JETCAS.2019.2899660. | 
| [17] | HE Yuwen, XIU Xiaoyu, HANHART P,    et al. Content-adaptive 360-degree video coding using hybrid cubemap projection[C]. 2018 Picture Coding Symposium, San Francisco, USA, 2018: 313–317. doi:  10.1109/PCS.2018.8456280. | 
| [18] | PI Jinyong, ZHANG Yun, ZHU Linwei,  et al. Texture-aware spherical rotation for high efficiency omnidirectional intra video coding[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(12): 8768–8780. doi:  10.1109/TCSVT.2022.3192665. | 
| [19] | SU Yuchuan and GRAUMAN K. Learning compressible 360° video isomers[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 7824–7833. doi:  10.1109/CVPR.2018.00816. | 
| [20] | HUANG Han, WOODS J W, ZHAO Yao,  et al. Control-point representation and differential coding affine-motion compensation[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2013, 23(10): 1651–1660. doi:  10.1109/TCSVT.2013.2254977. | 
| [21] | DE SIMONE F, FROSSARD P, BIRKBECK N,    et al. Deformable block-based motion estimation in omnidirectional image sequences[C]. 2017 IEEE 19th International Workshop on Multimedia Signal Processing, Luton, United Kingdom, 2017: 1–6. doi:  10.1109/MMSP.2017.8122254. | 
| [22] | MARIE A, BIDGOLI N M, MAUGEY T,    et al. Rate-distortion optimized motion estimation for on-the-sphere compression of 360 videos[C]. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto, Canada, 2021: 1570–1574. doi:  10.1109/ICASSP39728.2021.9413681. | 
| [23] | VISHWANATH B, NANJUNDASWAMY T, and ROSE K. A geodesic translation model for spherical video compression[J]. IEEE Transactions on Image Processing, 2022, 31: 2136–2147. doi:  10.1109/TIP.2022.3152059. | 
| [24] | VISHWANATH B, NANJUNDASWAMY T, and ROSE K. Rotational motion model for temporal prediction in 360 video coding[C]. 2017 IEEE 19th International Workshop on Multimedia Signal Processing, Luton, United Kingdom, 2017: 1–6. doi:  10.1109/MMSP.2017.8122231. | 
| [25] | VISHWANATH B, ROSE K, HE Yuwen,    et al. Rotational motion compensated prediction in HEVC based omnidirectional video coding[C]. 2018 Picture Coding Symposium, San Francisco, USA, 2018: 323–327. doi:  10.1109/PCS.2018.8456296. | 
| [26] | WANG Yefei, LIU Dong, MA Siwei,  et al. Spherical coordinates transform-based motion model for panoramic video coding[J]. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2019, 9(1): 98–109. doi:  10.1109/JETCAS.2019.2896265. | 
| [27] | VISHWANATH B and ROSE K. Spherical video coding with geometry and region adaptive transform domain temporal prediction[C]. The ICASSP 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, Spain, 2020: 2043–2047. doi:  10.1109/ICASSP40776.2020.9054211. | 
| [28] | VISHWANATH B, NANJUNDASWAMY T, and ROSE K. Effective prediction modes design for adaptive compression with application in video coding[J]. IEEE Transactions on Image Processing, 2022, 31: 636–647. doi:  10.1109/TIP.2021.3134454. | 
| [29] | ISO/IEC 23090-2: 2019 Information technology — Coded representation of immersive media — Part 2: Omnidirectional media format[S]. 2019. | 
| [30] | HERRE J, HILPERT J, KUNTZ A,  et al. MPEG-H 3D audio—The new standard for coding of immersive spatial audio[J]. IEEE Journal of Selected Topics in Signal Processing, 2015, 9(5): 770–779. doi:  10.1109/JSTSP.2015.2411578. | 
| [31] | YE Yan, ALSHINA E, and BOYCE J M. Algorithm descriptions of projection format conversion and video quality metrics in 360Lib[C]. Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11 8th Meeting, Macao, China, 2017. | 
| [32] | ITU-T Rec. H. 265 and ISO/IEC 23008-2. High efficiency video coding[S]. 2020. | 
| [33] | 朱秀昌, 唐贵进. H. 266/VVC: 新一代通用视频编码国际标准[J]. 南京邮电大学学报:自然科学版, 2021, 41(2): 1–11. doi:  10.14132/j.cnki.1673-5439.2021.02.001. ZHU Xiuchang and TANG Guijin. H. 266/VVC: Versatile video coding international standard[J]. Journal of Nanjing University of Posts and Telecommunications:Natural Science Edition, 2021, 41(2): 1–11. doi:  10.14132/j.cnki.1673-5439.2021.02.001. | 
| [34] | ITU-T. ITU-T Rec. H. 266 and ISO/IEC 23090-3 versatile video coding[S]. 2021. | 
| [35] | HE Y, BOYCE J, CHOI K,    et al. JVET common test conditions and evaluation procedures for 360° video[C]. Joint Video Exploration Team of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29, 2021: JVET-U2012. | 
| [36] | YU M, LAKSHMAN H, and GIROD B. A framework to evaluate omnidirectional video coding schemes[C]. 2015 IEEE International Symposium on Mixed and Augmented Reality, Fukuoka, Japan, 2015: 31–36. doi:  10.1109/ISMAR.2015.12. | 
| [37] | SUN Yule, LU Ang, and YU Lu. Weighted-to-spherically-uniform quality evaluation for omnidirectional video[J]. IEEE Signal Processing Letters, 2017, 24(9): 1408–1412. doi:  10.1109/LSP.2017.2720693. | 
| [38] | ZAKHARCHENKO V, CHOI K P, and PARK J H. Quality metric for spherical panoramic video[C]. SPIE 9970, Optics and Photonics for Information Processing X, San Diego, USA, 2016. doi:  10.1117/12.2235885. | 
| [39] | ADELSON E H and BERGEN J R. The plenoptic function and the elements of early vision[J]. Computational Models of Visual Processing, 1991, 1: 43–54. | 
| [40] | MCMILLAN L and BISHOP G. Plenoptic modeling: An image-based rendering system[C]. The 22nd Annual Conference on Computer Graphics and Interactive Techniques, Los Angeles, USA, 1995: 39–46. doi:  10.1145/218380.218398. | 
| [41] | LEVOY M and HANRAHAN P. Light field rendering[C]. The 23rd Annual Conference on Computer Graphics and Interactive Techniques, New Orleans, USA, 1996: 31–42. doi:  10.1145/237170.237199. | 
| [42] | MING Yue, MENG Xuyang, FAN Chunxiao,  et al. Deep learning for monocular depth estimation: A review[J]. Neurocomputing, 2021, 438: 14–33. doi:  10.1016/j.neucom.2020.12.089. | 
| [43] | TEWARI A, THIES J, MILDENHALL B,  et al. Advances in neural rendering[J]. Computer Graphics Forum, 2022, 41(2): 703–735. doi:  10.1111/cgf.14507. | 
| [44] | LV Chenlei, LIN Weisi, and ZHAO Baoquan. Voxel structure-based mesh reconstruction from a 3D point cloud[J]. IEEE Transactions on Multimedia, 2022, 24: 1815–1829. doi:  10.1109/TMM.2021.3073265. | 
| [45] | XU Yusheng, TONG Xiaohua, and STILLA U. Voxel-based representation of 3D point clouds: Methods, applications, and its potential use in the construction industry[J]. Automation in Construction, 2021, 126: 103675. doi:  10.1016/j.autcon.2021.103675. | 
| [46] | CHAN S C, SHUM H Y, and NG K T. Image-based rendering and synthesis[J]. IEEE Signal Processing Magazine, 2007, 24(6): 22–33. doi:  10.1109/MSP.2007.905702. | 
| [47] | TIAN Shishun, ZHANG Lu, ZOU Wenbin,  et al. Quality assessment of DIBR-synthesized views: An overview[J]. Neurocomputing, 2021, 423: 158–178. doi:  10.1016/j.neucom.2020.09.062. | 
| [48] | HEDMAN P, PHILIP J, PRICE T,  et al. Deep blending for free-viewpoint image-based rendering[J]. ACM Transactions on Graphics, 2018, 37(6): 257. doi:  10.1145/3272127.3275084. | 
| [49] | NGUYEN-PHUOC T, LI Chuan, BALABAN S,    et al. RenderNet: A deep convolutional network for differentiable rendering from 3D shapes[C]. The 31st International Conference on Neural Information Processing Systems, Montréal, Canada, 2018. | 
| [50] | TEWARI A, FRIED O, THIES J,  et al. State of the art on neural rendering[J]. Computer Graphics Forum, 2020, 39(2): 701–727. doi:  10.1111/cgf.14022. | 
| [51] | OECHSLE M, MESCHEDER L, NIEMEYER M,    et al. Texture fields: Learning texture representations in function space[C]. 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 2019: 4531–4540. doi:  10.1109/ICCV.2019.00463. | 
| [52] | SITZMANN V, THIES J, HEIDE F,    et al. DeepVoxels: Learning persistent 3D feature embeddings[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 2437–2446. doi:  10.1109/CVPR.2019.00254. | 
| [53] | MILDENHALL B, SRINIVASAN P P, TANCIK M,  et al. NeRF: Representing scenes as neural radiance fields for view synthesis[J]. Communications of the ACM, 2022, 65(1): 99–106. doi:  10.1145/3503250. | 
| [54] | PUMAROLA A, CORONA E, PONS-MOLL G,    et al. D-NeRF: Neural radiance fields for dynamic scenes[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 10318–10327. doi:  10.1109/CVPR46437.2021.01018. | 
| [55] | XIE Yiheng, TAKIKAWA T, SAITO S,  et al. Neural fields in visual computing and beyond[J]. Computer Graphics Forum, 2022, 41(2): 641–676. doi:  10.1111/cgf.14505. | 
| [56] | QUACH M, PANG Jiahao, TIAN Dong,  et al. Survey on deep learning-based point cloud compression[J]. Frontiers in Signal Processing, 2022, 2: 846972. doi:  10.3389/frsip.2022.846972. | 
| [57] | SCHAEFER R. Call for proposals for point cloud compression V2[C]. ISO/IEC JTC1 SC29/WG11 MPEG, 117th Meeting. Hobart, TAS, 2017: document N16763. | 
| [58] | ISO/IEC 23090-9: 2023 Information technology-Coded representation of immersive media-Part 9: Geometry-based point cloud compression[S]. International Organization for Standardization, 2023. | 
| [59] | ISO/IEC 23090-5 Video-based point cloud compression[S]. International Organization for Standardization, 2021. | 
| [60] | CONTI C, SOARES L D, and NUNES P. Dense light field coding: A survey[J]. IEEE Access, 2020, 8: 49244–49284. doi:  10.1109/ACCESS.2020.2977767. | 
| [61] | 刘宇洋, 朱策, 郭红伟. 光场数据压缩研究综述[J]. 中国图象图形学报, 2019, 24(11): 1842–1859. doi:  10.11834/jig.190035. LIU Yuyang, ZHU Ce, and GUO Hongwei. Survey of light field data compression[J]. Journal of Image and Graphics, 2019, 24(11): 1842–1859. doi:  10.11834/jig.190035. | 
| [62] | PERRA C, MAHMOUDPOUR S, and PAGLIARI C. JPEG pleno light field: Current standard and future directions[C]. SPIE 12138, Optics, Photonics and Digital Technologies for Imaging Applications VII, Strasbourg, France, 2022: 153–156. doi:  10.1117/12.2624083. | 
| [63] | TECH G, CHEN Ying, MÜLLER K,  et al. Overview of the multiview and 3D extensions of high efficiency video coding[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2016, 26(1): 35–49. doi:  10.1109/TCSVT.2015.2477935. | 
| [64] | SALAHIEH B, JUNG J, and DZIEMBOWSKI A. Test model for immersive video[C]. ISO/IEC JTC1 SC29/WG11 MPEG, 136th Meeting, 2021: document N0142. | 
| [65] | JUNG J and KROON B. Common test conditions for MPEG immersive video[C]. ISO/IEC JTC 1/SC 29/WG 04 MPEG, 137th Meeting, 2022: document N0169. | 
| [66] | DZIEMBOWSKI A, MIELOCH D, STANKOWSKI J,  et al. IV-PSNR—the objective quality metric for immersive video applications[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(11): 7575–7591. doi:  10.1109/TCSVT.2022.3179575. | 
| [67] | ISO/IEC 23090-10: 2022 Information technology-Coded representation of immersive media-Part 10: Carriage of visual volumetric video-based coding data[S]. International Organization for Standardization, 2022. | 
| [68] | ISO/IEC FDIS 23090-12: 2023 Information technology-Coded representation of immersive media-Part 12: MPEG immersive video[S]. 2023. | 
| [69] | MILOVANOVIĆ M, HENRY F, CAGNAZZO M,    et al. Patch decoder-side depth estimation in MPEG immersive video[C]. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto, Canada, 2021: 1945–1949. doi:  10.1109/ICASSP39728.2021.9414056. | 
| [70] | BROSS B, WANG Yekui, YE Yan,  et al. Overview of the Versatile Video Coding (VVC) standard and its applications[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(10): 3736–3764. doi:  10.1109/TCSVT.2021.3101953. | 
| [71] | WIECKOWSKI A, BRANDENBURG J, HINZ T,    et al. VVenC: An open and optimized VVC encoder implementation[C]. 2021 IEEE International Conference on Multimedia & Expo Workshops, Shenzhen, China, 2021: 1–2. doi:  10.1109/ICMEW53276.2021.9455944. | 
| [72] | Reference view synthesizer (RVS) manual[C]. ISO/IEC JTC 1/SC 29/WG 04 MPEG, 124th Meeting, Macao, China, 2018: N18068. | 
