Wireless Multimodal Communications for 6G

REN Chao; DING Siying; ZHANG Xiaoqi; ZHANG Haijun

doi:10.11999/JEIT231201

Volume 46 Issue 5

May 2024

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2024 > 46(5): 1658-1671

REN Chao, DING Siying, ZHANG Xiaoqi, ZHANG Haijun. Wireless Multimodal Communications for 6G[J]. Journal of Electronics & Information Technology, 2024, 46(5): 1658-1671. doi: 10.11999/JEIT231201

Citation:

REN Chao, DING Siying, ZHANG Xiaoqi, ZHANG Haijun. Wireless Multimodal Communications for 6G[J]. Journal of Electronics & Information Technology, 2024, 46(5): 1658-1671. doi: 10.11999/JEIT231201

REN Chao, DING Siying, ZHANG Xiaoqi, ZHANG Haijun. Wireless Multimodal Communications for 6G[J]. Journal of Electronics & Information Technology, 2024, 46(5): 1658-1671. doi: 10.11999/JEIT231201

Citation:

REN Chao, DING Siying, ZHANG Xiaoqi, ZHANG Haijun. Wireless Multimodal Communications for 6G[J]. Journal of Electronics & Information Technology, 2024, 46(5): 1658-1671. doi: 10.11999/JEIT231201

PDF( 2721 KB)

Wireless Multimodal Communications for 6G

doi: 10.11999/JEIT231201 cstr: 32379.14.JEIT231201

Beijing University of Science and Technology, Beijing 100083, China

Funds: The National Natural Science Foundation of China (62201034, U22B2003, 62341103), Beijing Municipal Natural Science Foundation (L212004-03, L212004)

Received Date: 2023-11-01
Rev Recd Date: 2024-03-01

Available Online: 2024-03-11

Publish Date: 2024-05-30

Abstract

Abstract

An overview of multimodal communication as an important information transfer mode that can simultaneously interact with multiple modal forms in different application scenarios is proposed in this paper. The future development prospects of multimodal communication in 6G wireless communication technology is also discussed. Firstly, multimodal communication is classified into three categories, and its key roles in these fields are explored. Furthermore, a deep analysis is conducted on the communication, sensation, computation, and storage resource limitations, as well as cross-domain resource management issues that 6G wireless communication systems may face. It points out that future 6G wireless multimodal communication will achieve deep integration of communication perception, computation, and storage, as well as enhance communication capabilities. In the process of implementing multimodal communication, various aspects must be considered, including multi-transmitter processing, transmission technology, and receiver processing, in order to address challenges in multimodal corpus construction, multimodal information compression, transmission, interference handling, noise reduction, alignment, fusion, and expansion, as well as resource management issues. Finally, the importance of cross-domain multimodal information transfer, complementarity, and collaboration in the 6G network is emphasized. This will better integrate and apply a massive amount of heterogeneous information to meet the future communication demands of high-speed, low-latency, and intelligent interconnection.
- Multimodal communication,
- Multimedia information,
- Wireless communications

FullText(HTML)

References(63)

References

[1]	CHEN Guanghui and ZENG Xiaoping. Multi-modal emotion recognition by fusing correlation features of speech-visual[J]. IEEE Signal Processing Letters, 2021, 28: 533–537. doi: 10.1109/LSP.2021.3055755.
[2]	LIU Shuang, DUAN Linlin, ZHANG Zhong, et al. Multimodal ground-based remote sensing cloud classification via learning heterogeneous deep features[J]. IEEE Transactions on Geoscience and Remote Sensing, 2020, 58(11): 7790–7800. doi: 10.1109/TGRS.2020.2984265.
[3]	CHEN Haifeng, JIANG Dongmei, and SAHLI H. Transformer encoder with multi-modal multi-head attention for continuous affect recognition[J]. IEEE Transactions on Multimedia, 2021, 23: 4171–4183. doi: 10.1109/TMM.2020.3037496.
[4]	DENG Li, DU Xi and SHEN Ji zhong. Web page classification based on heterogeneous features and a combination of multiple classifiers[J]. Frontiers of Information Technology & Electronic Engineering, 2020, 21(217): 995–1004. doi: 10.1631/FITEE.1900240.
[5]	FINOMORE V JR, POPIK D JR, CASTLE C JR, et al. Effects of a network-centric multi-modal communication tool on a communication monitoring task[J]. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 2010, 54(25): 2125–2129. doi: 10.1177/154193121005402501.
[6]	PORWOL L and OJO A. VR-participation: The feasibility of the virtual reality-driven multi-modal communication technology facilitating e-Participation[C]. The 11th International Conference on Theory and Practice of Electronic Governance, Galway, Ireland, 2018: 269–278. doi: 10.1145/3209415.3209515.
[7]	徐建博, 魏昕, 周亮. 面向跨模态通信的信息恢复技术[J]. 电子学报, 2022, 50(7): 1631–1642. doi: 10.12263/DZXB.20210945. XU Jianbo, WEI Xin, and ZHOU Liang. Information recovery technology for cross-modal communications[J]. Acta Electronica Sinica, 2022, 50(7): 1631–1642. doi: 10.12263/DZXB.20210945.
[8]	XIE Jiayu, JIN Xin, and CAO Hongkun. SMRD: A local feature descriptor for multi-modal image registration[C]. 2021 International Conference on Visual Communications and Image Processing (VCIP), Munich, Germany, 2021: 1–5. doi: 10.1109/VCIP53242.2021.9675401.
[9]	GERMINIAN J F and TRICYA ESTERINA WIDAGDO ST. Utilizing postGIS extension to process spatial data stored in Neo4j database[C]. IEEE International Conference on Data and Software Engineering (ICoDSE), Toba, Indonesia, 2023: 250–255. doi: 10.1109/ICoDSE59534.2023.10291400.
[10]	王佩瑾, 闫志远, 容雪娥, 等. 数据受限条件下的多模态处理技术综述[J]. 中国图象图形学报, 2022, 27(10): 2803–2834. WANG Peijin, YAN Zhiyuan, RONG Xuee, et al. Review of multimodal data processing techniques with limited data[J]. Journal of Image and Graphics, 2022, 27(10): 2803–2834.
[11]	BHANDARI D and PAUL S. Perception net: A multimodal deep neural network for machine perception[C]. 2018 IEEE Symposium Series on Computational Intelligence (SSCI), Bangalore, India, 2018: 1419–1426. doi: 10.1109/SSCI.2018.8628711.
[12]	LACKEY S, BARBER D, REINERMAN L, et al. Defining next-generation multi-modal communication in human robot interaction[J]. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 2011, 55(1): 461–464. doi: 10.1177/1071181311551095.
[13]	AI Tianfang. Application of human-computer interaction technology in mobile interface design for digital media[C]. 2022 International Conference on Electronics and Devices, Computational Science (ICEDCS), Marseille, France, 2022: 152–155. doi: 10.1109/ICEDCS57360.2022.00040.
[14]	SHU Beibei, SZIEBIG G, and PIETERS R. Architecture for safe human-robot collaboration: Multi-modal communication in virtual reality for efficient task execution[C]. 2019 IEEE 28th International Symposium on Industrial Electronics (ISIE), Vancouver, Canada, 2019: 2297–2302. doi: 10.1109/ISIE.2019.8781372.
[15]	TODA Y and KUBOTA N. Computational intelligence for human-friendly robot partners based on multi-modal communication[C]. The 1st IEEE Global Conference on Consumer Electronics 2012, Tokyo, Japan, 2012: 309–313. doi: 10.1109/GCCE.2012.6379611.
[16]	CHU Ziyan, WANG Jiacheng, JIANG Xinyue, et al. mind-vr: a utility approach of human-computer interaction in virtual space based on autonomous consciousness[C]. 2022 International Conference on Virtual Reality, Human-Computer Interaction and Artificial Intelligence (VRHCIAI), Changsha, China, 2022: 134–138. doi: 10.1109/VRHCIAI57205.2022.00030.
[17]	WANG Dayan, QU Jue, WANG Wei, et al. The verification system for interface intelligent perception of human-computer interaction[C]. 2020 International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI), Sanya, China, 2020: 43–48. doi: 10.1109/ICHCI51889.2020.00017.
[18]	LIU Weiwei, PANG Yalong and LUAN Shenshen, et al. Building a Spaceborne integrated high-performance processing and computing platform based on SpaceVPX[C]. 2022 International Conference on Computing, Communication, Perception and Quantum Technology (CCPQT), Xiamen, China, 2022: 287-293. doi: 10.1109/CCPQT56151.2022.00056.
[19]	REN Chao and LIU Luchuan. Toward full passive internet of things: Symbiotic localization and ambient backscatter communication[J]. IEEE Internet of Things Journal, 2023, 10(22): 19495–19506. doi: 10.1109/JIOT.2023.3262779.
[20]	REN Chao, HE Zongrui, LI Yingqi, et al. Priority aggregation network with integrated computation and sensation for ultra dense artificial intelligence of things[J]. IEEE Wireless Communications Letters, 2024, 13(2): 270–273. doi: 10.1109/LWC.2023.3323421.
[21]	REN Chao, LIU Luchuan, and ZHANG Haijun. Multimodal interference compatible passive UAV network based on location-aware flexibility[J]. IEEE Wireless Communications Letters, 2023, 12(4): 640–643. doi: 10.1109/LWC.2023.3237637.
[22]	REN Chao, GONG Chao, and LIU Luchuan. Task-oriented multimodal communication based on cloud-edge-UAV collaboration[J]. IEEE Internet of Things Journal, 2024, 11(1): 125–136. doi: 10.1109/JIOT.2023.3295650.
[23]	REN Chao, GONG Chao, CAO Difei, et al. Enhancing reliability in multimodal UAV communication based on opportunistic task space[J]. IEEE Wireless Communications Letters, 2024, 13(2): 284–287. doi: 10.1109/LWC.2023.3326130.
[24]	KASHEVNIK A, LASHKOV I, AXYONOV A, et al. Multimodal corpus design for audio-visual speech recognition in vehicle cabin[J]. IEEE Access, 2021, 9: 34986–35003. doi: 10.1109/ACCESS.2021.3062752.
[25]	KARPOV A, RONZHIN A, and KIPYATKOVA I. Designing a multimodal corpus of audio-visual speech using a high-speed camera[C]. 2012 IEEE 11th International Conference on Signal Processing, Beijing, China, 2012: 519–522. doi: 10.1109/ICoSP.2012.6491539.
[26]	MONTORSI G and BENEDETTO S. Design of spatially coupled turbo product codes for optical communications[C]. 2021 11th International Symposium on Topics in Coding (ISTC), Montreal, Canada, 2021: 1–5. doi: 10.1109/ISTC49272.2021.9594120.
[27]	ÇALIŞKAN E K, GÜLBAZ A, TENGIZLER B, et al. NDC-O-OFDM with channel coding for IM/DD communication systems[C]. 2022 30th Signal Processing and Communications Applications Conference (SIU), Safranbolu, Turkey, 2022: 1–4. doi: 10.1109/SIU55565.2022.9864710.
[28]	KHAN M H and ZHANG Gongxuan. Evaluation of channel coding techniques for massive machine-type communication in 5G cellular network[C]. 2020 IEEE 3rd International Conference on Information Communication and Signal Processing (ICICSP), Shanghai, China, 2020: 375–379. doi: 10.1109/ICICSP50920.2020.9232037.
[29]	WANG Yue, SHI Qingwen, XU Li, et al. A study on the construction of traditional Chinese medicine multimodal corpus under the view of self-media[C]. 2022 International Conference on Computers, Information Processing and Advanced Education (CIPAE), Ottawa, Canada, 2022: 373–375. doi: 10.1109/CIPAE55637.2022.00084.
[30]	LI Guangyan, FANG Tian, LIANG Li, et al. Current situation of multimodal corpora and the method of Tibetan corpus construction[C]. 2018 IEEE International Conference of Safety Produce Informatization (IICSPI), Chongqing, China, 2018: 449–453. doi: 10.1109/IICSPI.2018.8690378.
[31]	YEH C H, LIN Y L, LI C C, et al. Compressed-and-forward: Compressive sensing for cooperative communication[C]. 2012 International Symposium on Intelligent Signal Processing and Communications Systems, Tamsui, China, 2012: 319–322. doi: 10.1109/ISPACS.2012.6473503.
[32]	HANECHE H, BOUDRAA B, and OUAHABI A. Compressed sensing investigation in an end-to-end rayleigh communication system: Speech compression[C]. 2018 International Conference on Smart Communications in Network Technologies (SaCoNeT), El Oued, Algeria, 2018: 73–77. doi: 10.1109/SaCoNeT.2018.8585702.
[33]	CHEN Mingkai, LIU Minghao, WANG Wenjun, et al. Cross-modal semantic communications in 6G[C]. 2023 IEEE/CIC International Conference on Communications in China (ICCC), Dalian, China, 2023: 1–6. doi: 10.1109/ICCC57788.2023.10233481.
[34]	YANG Lianxin, WU Dan, and ZHOU Liang. Heterogeneous stream scheduling for cross-modal transmission[J]. IEEE Transactions on Communications, 2021, 69(9): 6037–6049. doi: 10.1109/TCOMM.2021.3086522.
[35]	LU Kangkang, LIANG Meiyu, XUE Zhe, et al. Adversarial guided gradient estimation hashing for cross-modal retrieval[C]. 2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS), Chengdu, China, 2022: 109–113. doi: 10.1109/CCIS57298.2022.10016424.
[36]	ZHAO Wenying, XU Qian, WANG Haoyu, et al. Cross-modal hashing for material surface properties fusion[C]. 2023 International Wireless Communications and Mobile Computing (IWCMC), Marrakesh, Morocco, 2023: 194–198. doi: 10.1109/IWCMC58020.2023.10183090.
[37]	HUANG Yingying, WANG Quan, ZHANG Yipeng, et al. A unified perspective of multi-level cross-modal similarity for cross-modal retrieval[C]. 2022 5th International Conference on Information Communication and Signal Processing (ICICSP), Shenzhen, China, 2022: 466–471. doi: 10.1109/ICICSP55539.2022.10050678.
[38]	WEI Xin, SHI Yingying, and ZHOU Liang. Haptic signal reconstruction for cross-modal communications[J]. IEEE Transactions on Multimedia, 2022, 24: 4514–4525. doi: 10.1109/TMM.2021.3119860.
[39]	王惠琴, 高大庆, 何永强, 等. GPR图像的数据集构建及其DRDU-Net去噪算法[J/OL]. 湖南大学学报: 自然科学版,http://kns.cnki.net/kcms/detail/43.1061.N.20231017.1626.004.html, 2023. WANG Huiqin, GAO Daqing, HE Yongqiang, et al. DRDU-net based denoising algorithm for GPR image dataset[J/OL]. Journal of Hunan University: Natural Sciences,http://kns.cnki.net/kcms/detail/43.1061.N.20231017.1626.004.html, 2023.
[40]	AKIYAMA H, TANAKA M, and OKUTOMI M. Pseudo four-channel image denoising for noisy CFA raw data[C]. 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, Canada, 2015: 4778–4782. doi: 10.1109/ICIP.2015.7351714.
[41]	LI Jun, YU Menghong, and YUAN Wei. Application of wavelet new threshold denoising in ship data preprocessing and modeling[C]. 2020 Chinese Automation Congress (CAC), Shanghai, China, 2020: 1263–1267. doi: 10.1109/CAC51589.2020.9326954.
[42]	WANG Feng, YANG Bo, WANG Yuqing, et al. Learning from noisy data: An unsupervised random denoising method for seismic data using model-based deep learning[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5913314. doi: 10.1109/TGRS.2022.3165037.
[43]	TIAN Chongpeng, HONG Mei, LI Dongying, et al. Deep recurrent neural network for ground-penetrating radar signal denoising[C]. 2022 4th International Conference on Intelligent Information Processing (IIP), Guangzhou, China, 2022: 85−88. doi: 10.1109/IIP57348.2022.00024.
[44]	GONG Jianhu. Denoising control method of abnormal signals in communication networks based on big data analysis[C]. 2021 13th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), Beihai, China, 2021: 329–334. doi: 10.1109/ICMTMA52658.2021.00077.
[45]	LIU Yuchi, YAO Yue, WANG Zhengjie, et al. Generalized alignment for multimodal physiological signal learning[C]. 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 2019: 1–10. doi: 10.1109/IJCNN.2019.8852216.
[46]	CHEN Liyi, LI Zhi and WANG Yijun, et al. MMEA: Entity alignment for multi-modal knowledge graph[C]. In Knowledge Science, Engineering and Management: 13th International Conference, KSEM 2020, Hangzhou, China, 2020: 134–147. doi: https://doi.org/10.1007/978-3-030-55130-8_12.
[47]	王欢, 宋丽娟, 杜方. 基于多模态知识图谱的中文跨模态实体对齐方法[J]. 计算机工程, 2023, 49(12): 88–95. doi: 10.19678/j.issn.1000-3428.0066938. WANG Huan, SONG Lijuan, and DU Fang. Chinese cross-modal entity alignment method based on multi-modal knowledge graph[J]. Computer Engineering, 2023, 49(12): 88–95. doi: 10.19678/j.issn.1000-3428.0066938.
[48]	罗俊豪, 朱焱. 用于未对齐多模态语言序列情感分析的多交互感知网络[J]. 计算机应用, 2024, 44(1): 79–85. doi: 10.11772/j.issn.1001-9081.2023060815. LUO Junhao and ZHU Yan. Multi-dynamic aware network for unaligned multimodal language sequence sentiment analysis[J]. Journal of Computer Applications, 2024, 44(1): 79–85. doi: 10.11772/j.issn.1001-9081.2023060815.
[49]	LIU Ye, QIAO Lingfeng, LU Changchong, et al. OSAN: A one-stage alignment network to unify multimodal alignment and unsupervised domain adaptation[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023: 3551–3560. doi: 10.1109/CVPR52729.2023.00346.
[50]	GAO Lei and GUAN Ling. A discriminative vectorial framework for multi-modal feature representation[J]. IEEE Transactions on Multimedia, 2022, 24: 1503–1514. doi: 10.1109/TMM.2021.3066118.
[51]	XUE Zhixiang, YU Xuchu, ZHANG Pengqiang, et al. Self-supervised feature learning and few-shot land cover classification for cross-modal remote sensing images[C]. IGARSS 2022 - 2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 2022: 3600–3603. doi: 10.1109/IGARSS46834.2022.9884265.
[52]	AL-HMOUZ R, DAQROUQ K, MORFEQ A, et al. Multimodal biometrics using multiple feature representations to speaker identification system[C]. 2015 International Conference on Information and Communication Technology Research (ICTRC), Abu Dhabi, United Arab Emirates, 2015: 314–317. doi: 10.1109/ICTRC.2015.7156485.
[53]	GAO Lei and GUAN Ling. Interpretable learning-based multi-modal hashing analysis for multi-view feature representation learning[C]. 2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR), CA, USA, 2022: 47–52. doi: 10.1109/MIPR54900.2022.00016.
[54]	WANG Shiping and GUO Wenzhong. Sparse multigraph embedding for multimodal feature representation[J]. IEEE Transactions on Multimedia, 2017, 19(7): 1454–1466. doi: 10.1109/TMM.2017.2663324.
[55]	SHEKHAR S, PATEL V M, NASRABADI N M, et al. Joint sparse representation for robust multimodal biometrics recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(1): 113–126. doi: 10.1109/TPAMI.2013.109.
[56]	HOU Xiaofeng, XU Cheng, LIU Jiacheng, et al. Characterizing and understanding end-to-end multi-modal neural networks on GPUs[J]. IEEE Computer Architecture Letters, 2022, 21(2): 125–128. doi: 10.1109/LCA.2022.3215718.
[57]	LI Shuzhen, ZHANG Tong, CHEN Bianna, et al. MIA-Net: Multi-modal interactive attention network for multi-modal affective analysis[J]. IEEE Transactions on Affective Computing, 2023, 14(4): 2796–2809. doi: 10.1109/TAFFC.2023.3259010.
[58]	LIU Bin. Robust dynamic multi-modal data fusion: A model uncertainty perspective[J]. IEEE Signal Processing Letters, 2021, 28: 2107–2111. doi: 10.1109/LSP.2021.3117731.
[59]	TIAN Pinzhuo, LI Wenbin, and GAO Yang. Consistent meta-regularization for better meta-knowledge in few-shot learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(12): 7277–7288. doi: 10.1109/TNNLS.2021.3084733.
[60]	WANG Ruiqi, ZHANG Xuyao, and LIU Chenglin. Meta-prototypical learning for domain-agnostic few-shot recognition[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(11): 6990–6996. doi: 10.1109/TNNLS.2021.3083650.
[61]	YAN Minghao. Adaptive learning knowledge networks for few-shot learning[J]. IEEE Access, 2019, 7: 119041–119051. doi: 10.1109/ACCESS.2019.2934694.
[62]	WANG Kai. An overview of deep learning based small sample medical imaging classification[C]. 2021 International Conference on Signal Processing and Machine Learning (CONF-SPML), Stanford, USA, 2021: 278–281. doi: 10.1109/CONF-SPML54095.2021.00060.
[63]	LIANG Yong, CHEN Zetao, LIN Daoqian, et al. Three-dimension attention mechanism and self-supervised pretext task for augmenting few-shot learning[J]. IEEE Access, 2023, 11: 59428–59437. doi: 10.1109/ACCESS.2023.3285721.