6G无线多模态通信技术

任超; 丁思颖; 张晓奇; 张海君

doi:10.11999/JEIT231201

6G无线多模态通信技术

doi: 10.11999/JEIT231201

北京科技大学北京 10083

基金项目: 国家自然科学基金(62201034, U22B2003, 62341103)，北京市自然科学基金(L212004, L212004-03)

详细信息

作者简介:
任超：男，讲师，研究方向为协作无线通信、空天地一体化通信、边缘计算和通感算一体化通信技术

丁思颖：女，本科生，研究方向为智能通信技术

张晓奇：女，博士生，研究方向为6G移动通信和人工智能技术

张海君：男，教授，研究方向为6G移动通信、B5G行业应用、NTN网络、数字孪生和人工智能

通讯作者:
张海君　haijunzhang@ieee.org

中图分类号: TN92
计量
- 文章访问数: 219
- HTML全文浏览量: 53
- PDF下载量: 80
- 被引次数: 0
出版历程
- 收稿日期: 2023-11-01
- 修回日期: 2024-03-01
- 网络出版日期: 2024-03-11

Wireless Multimodal Communications for 6G

Beijing University of Science and Technology, Beijing 100083, China

Funds: The National Natural Science Foundation of China (62201034, U22B2003, 62341103), Beijing Municipal Natural Science Foundation (L212004-03, L212004)

摘要

摘要: 该文综述了多模态通信作为一种能够同时交互多种模态形式的信息转移方式在不同应用场景下的重要性及其未来在6G无线通信技术中的发展前景。首先，将多模态通信分为3类，并探讨了其在这些领域中的关键作用。随后，针对6G无线通信系统可能面临的通信、感知、计算和存储资源限制以及跨域资源管理问题进行了深入剖析，指出未来的6G无线多模态通信将实现通感算存的深度融合和通信能力的提升。在多模态通信实现过程中，必须考虑多个环节，包括多发送端处理、传输技术和接收端处理等，以解决多模态语料库构建、多模态信息压缩、传输、干扰处理、降噪、对齐、融合和扩充等方面的挑战，以及资源管理问题。最后，强调了6G网络的跨域多模态信息转移、互补和协同的重要性，这将更好地整合和应用海量异构信息，以满足未来高速、低延迟、智能互联的通信需求。
- 多模态通信 /
- 多媒体信息 /
- 无线通信技术
Abstract: An overview of multimodal communication as an important information transfer mode that can simultaneously interact with multiple modal forms in different application scenarios is proposed in this paper. The future development prospects of multimodal communication in 6G wireless communication technology is also discussed. Firstly, multimodal communication is classified into three categories, and its key roles in these fields are explored. Furthermore, a deep analysis is conducted on the communication, sensation, computation, and storage resource limitations, as well as cross-domain resource management issues that 6G wireless communication systems may face. It points out that future 6G wireless multimodal communication will achieve deep integration of communication perception, computation, and storage, as well as enhance communication capabilities. In the process of implementing multimodal communication, various aspects must be considered, including multi-transmitter processing, transmission technology, and receiver processing, in order to address challenges in multimodal corpus construction, multimodal information compression, transmission, interference handling, noise reduction, alignment, fusion, and expansion, as well as resource management issues. Finally, the importance of cross-domain multimodal information transfer, complementarity, and collaboration in the 6G network is emphasized. This will better integrate and apply a massive amount of heterogeneous information to meet the future communication demands of high-speed, low-latency, and intelligent interconnection.
- Multimodal communication /
- Multimedia information /
- Wireless communications

HTML全文

图 1 多模态通信分类

下载: 全尺寸图片幻灯片

图 2 多模态通信系统的使能技术

下载: 全尺寸图片幻灯片

图 3 跨模态通信技术

下载: 全尺寸图片幻灯片

图 4 多模态通信接收端技术

下载: 全尺寸图片幻灯片

参考文献(63)

[1]	CHEN Guanghui and ZENG Xiaoping. Multi-modal emotion recognition by fusing correlation features of speech-visual[J]. IEEE Signal Processing Letters, 2021, 28: 533–537. doi: 10.1109/LSP.2021.3055755.
[2]	LIU Shuang, DUAN Linlin, ZHANG Zhong, et al. Multimodal ground-based remote sensing cloud classification via learning heterogeneous deep features[J]. IEEE Transactions on Geoscience and Remote Sensing, 2020, 58(11): 7790–7800. doi: 10.1109/TGRS.2020.2984265.
[3]	CHEN Haifeng, JIANG Dongmei, and SAHLI H. Transformer encoder with multi-modal multi-head attention for continuous affect recognition[J]. IEEE Transactions on Multimedia, 2021, 23: 4171–4183. doi: 10.1109/TMM.2020.3037496.
[4]	DENG Li, DU Xi and SHEN Ji zhong. Web page classification based on heterogeneous features and a combination of multiple classifiers[J]. Frontiers of Information Technology & Electronic Engineering, 2020, 21(217): 995–1004. doi: 10.1631/FITEE.1900240.
[5]	FINOMORE V JR, POPIK D JR, CASTLE C JR, et al. Effects of a network-centric multi-modal communication tool on a communication monitoring task[J]. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 2010, 54(25): 2125–2129. doi: 10.1177/154193121005402501.
[6]	PORWOL L and OJO A. VR-participation: The feasibility of the virtual reality-driven multi-modal communication technology facilitating e-Participation[C]. The 11th International Conference on Theory and Practice of Electronic Governance, Galway, Ireland, 2018: 269–278. doi: 10.1145/3209415.3209515.
[7]	徐建博, 魏昕, 周亮. 面向跨模态通信的信息恢复技术[J]. 电子学报, 2022, 50(7): 1631–1642. doi: 10.12263/DZXB.20210945. XU Jianbo, WEI Xin, and ZHOU Liang. Information recovery technology for cross-modal communications[J]. Acta Electronica Sinica, 2022, 50(7): 1631–1642. doi: 10.12263/DZXB.20210945.
[8]	XIE Jiayu, JIN Xin, and CAO Hongkun. SMRD: A local feature descriptor for multi-modal image registration[C]. 2021 International Conference on Visual Communications and Image Processing (VCIP), Munich, Germany, 2021: 1–5. doi: 10.1109/VCIP53242.2021.9675401.
[9]	GERMINIAN J F and TRICYA ESTERINA WIDAGDO ST. Utilizing postGIS extension to process spatial data stored in Neo4j database[C]. IEEE International Conference on Data and Software Engineering (ICoDSE), Toba, Indonesia, 2023: 250–255. doi: 10.1109/ICoDSE59534.2023.10291400.
[10]	王佩瑾, 闫志远, 容雪娥, 等. 数据受限条件下的多模态处理技术综述[J]. 中国图象图形学报, 2022, 27(10): 2803–2834. WANG Peijin, YAN Zhiyuan, RONG Xuee, et al. Review of multimodal data processing techniques with limited data[J]. Journal of Image and Graphics, 2022, 27(10): 2803–2834.
[11]	BHANDARI D and PAUL S. Perception net: A multimodal deep neural network for machine perception[C]. 2018 IEEE Symposium Series on Computational Intelligence (SSCI), Bangalore, India, 2018: 1419–1426. doi: 10.1109/SSCI.2018.8628711.
[12]	LACKEY S, BARBER D, REINERMAN L, et al. Defining next-generation multi-modal communication in human robot interaction[J]. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 2011, 55(1): 461–464. doi: 10.1177/1071181311551095.
[13]	AI Tianfang. Application of human-computer interaction technology in mobile interface design for digital media[C]. 2022 International Conference on Electronics and Devices, Computational Science (ICEDCS), Marseille, France, 2022: 152–155. doi: 10.1109/ICEDCS57360.2022.00040.
[14]	SHU Beibei, SZIEBIG G, and PIETERS R. Architecture for safe human-robot collaboration: Multi-modal communication in virtual reality for efficient task execution[C]. 2019 IEEE 28th International Symposium on Industrial Electronics (ISIE), Vancouver, Canada, 2019: 2297–2302. doi: 10.1109/ISIE.2019.8781372.
[15]	TODA Y and KUBOTA N. Computational intelligence for human-friendly robot partners based on multi-modal communication[C]. The 1st IEEE Global Conference on Consumer Electronics 2012, Tokyo, Japan, 2012: 309–313. doi: 10.1109/GCCE.2012.6379611.
[16]	CHU Ziyan, WANG Jiacheng, JIANG Xinyue, et al. mind-vr: a utility approach of human-computer interaction in virtual space based on autonomous consciousness[C]. 2022 International Conference on Virtual Reality, Human-Computer Interaction and Artificial Intelligence (VRHCIAI), Changsha, China, 2022: 134–138. doi: 10.1109/VRHCIAI57205.2022.00030.
[17]	WANG Dayan, QU Jue, WANG Wei, et al. The verification system for interface intelligent perception of human-computer interaction[C]. 2020 International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI), Sanya, China, 2020: 43–48. doi: 10.1109/ICHCI51889.2020.00017.
[18]	LIU Weiwei, PANG Yalong and LUAN Shenshen, et al. Building a Spaceborne integrated high-performance processing and computing platform based on SpaceVPX. 2022 International Conference on Computing, Communication, Perception and Quantum Technology (CCPQT), Xiamen, China, 2022: 287-293. doi: 10.1109/CCPQT56151.2022.00056.
[19]	REN Chao and LIU Luchuan. Toward full passive internet of things: Symbiotic localization and ambient backscatter communication[J]. IEEE Internet of Things Journal, 2023, 10(22): 19495–19506. doi: 10.1109/JIOT.2023.3262779.
[20]	REN Chao, HE Zongrui, LI Yingqi, et al. Priority aggregation network with integrated computation and sensation for ultra dense artificial intelligence of things[J]. IEEE Wireless Communications Letters, 2024, 13(2): 270–273. doi: 10.1109/LWC.2023.3323421.
[21]	REN Chao, LIU Luchuan, and ZHANG Haijun. Multimodal interference compatible passive UAV network based on location-aware flexibility[J]. IEEE Wireless Communications Letters, 2023, 12(4): 640–643. doi: 10.1109/LWC.2023.3237637.
[22]	REN Chao, GONG Chao, and LIU Luchuan. Task-oriented multimodal communication based on cloud-edge-UAV collaboration[J]. IEEE Internet of Things Journal, 2024, 11(1): 125–136. doi: 10.1109/JIOT.2023.3295650.
[23]	REN Chao, GONG Chao, CAO Difei, et al. Enhancing reliability in multimodal UAV communication based on opportunistic task space[J]. IEEE Wireless Communications Letters, 2024, 13(2): 284–287. doi: 10.1109/LWC.2023.3326130.
[24]	KASHEVNIK A, LASHKOV I, AXYONOV A, et al. Multimodal corpus design for audio-visual speech recognition in vehicle cabin[J]. IEEE Access, 2021, 9: 34986–35003. doi: 10.1109/ACCESS.2021.3062752.
[25]	KARPOV A, RONZHIN A, and KIPYATKOVA I. Designing a multimodal corpus of audio-visual speech using a high-speed camera[C]. 2012 IEEE 11th International Conference on Signal Processing, Beijing, China, 2012: 519–522. doi: 10.1109/ICoSP.2012.6491539.
[26]	MONTORSI G and BENEDETTO S. Design of spatially coupled turbo product codes for optical communications[C]. 2021 11th International Symposium on Topics in Coding (ISTC), Montreal, Canada, 2021: 1–5. doi: 10.1109/ISTC49272.2021.9594120.
[27]	ÇALIŞKAN E K, GÜLBAZ A, TENGIZLER B, et al. NDC-O-OFDM with channel coding for IM/DD communication systems[C]. 2022 30th Signal Processing and Communications Applications Conference (SIU), Safranbolu, Turkey, 2022: 1–4. doi: 10.1109/SIU55565.2022.9864710.
[28]	KHAN M H and ZHANG Gongxuan. Evaluation of channel coding techniques for massive machine-type communication in 5G cellular network[C]. 2020 IEEE 3rd International Conference on Information Communication and Signal Processing (ICICSP), Shanghai, China, 2020: 375–379. doi: 10.1109/ICICSP50920.2020.9232037.
[29]	WANG Yue, SHI Qingwen, XU Li, et al. A study on the construction of traditional Chinese medicine multimodal corpus under the view of self-media[C]. 2022 International Conference on Computers, Information Processing and Advanced Education (CIPAE), Ottawa, Canada, 2022: 373–375. doi: 10.1109/CIPAE55637.2022.00084.
[30]	LI Guangyan, FANG Tian, LIANG Li, et al. Current situation of multimodal corpora and the method of Tibetan corpus construction[C]. 2018 IEEE International Conference of Safety Produce Informatization (IICSPI), Chongqing, China, 2018: 449–453. doi: 10.1109/IICSPI.2018.8690378.
[31]	YEH C H, LIN Y L, LI C C, et al. Compressed-and-forward: Compressive sensing for cooperative communication[C]. 2012 International Symposium on Intelligent Signal Processing and Communications Systems, Tamsui, China, 2012: 319–322. doi: 10.1109/ISPACS.2012.6473503.
[32]	HANECHE H, BOUDRAA B, and OUAHABI A. Compressed sensing investigation in an end-to-end rayleigh communication system: Speech compression[C]. 2018 International Conference on Smart Communications in Network Technologies (SaCoNeT), El Oued, Algeria, 2018: 73–77. doi: 10.1109/SaCoNeT.2018.8585702.
[33]	CHEN Mingkai, LIU Minghao, WANG Wenjun, et al. Cross-modal semantic communications in 6G[C]. 2023 IEEE/CIC International Conference on Communications in China (ICCC), Dalian, China, 2023: 1–6. doi: 10.1109/ICCC57788.2023.10233481.
[34]	YANG Lianxin, WU Dan, and ZHOU Liang. Heterogeneous stream scheduling for cross-modal transmission[J]. IEEE Transactions on Communications, 2021, 69(9): 6037–6049. doi: 10.1109/TCOMM.2021.3086522.
[35]	LU Kangkang, LIANG Meiyu, XUE Zhe, et al. Adversarial guided gradient estimation hashing for cross-modal retrieval[C]. 2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS), Chengdu, China, 2022: 109–113. doi: 10.1109/CCIS57298.2022.10016424.
[36]	ZHAO Wenying, XU Qian, WANG Haoyu, et al. Cross-modal hashing for material surface properties fusion[C]. 2023 International Wireless Communications and Mobile Computing (IWCMC), Marrakesh, Morocco, 2023: 194–198. doi: 10.1109/IWCMC58020.2023.10183090.
[37]	HUANG Yingying, WANG Quan, ZHANG Yipeng, et al. A unified perspective of multi-level cross-modal similarity for cross-modal retrieval[C]. 2022 5th International Conference on Information Communication and Signal Processing (ICICSP), Shenzhen, China, 2022: 466–471. doi: 10.1109/ICICSP55539.2022.10050678.
[38]	WEI Xin, SHI Yingying, and ZHOU Liang. Haptic signal reconstruction for cross-modal communications[J]. IEEE Transactions on Multimedia, 2022, 24: 4514–4525. doi: 10.1109/TMM.2021.3119860.
[39]	王惠琴, 高大庆, 何永强, 等. GPR图像的数据集构建及其DRDU-Net去噪算法[J/OL]. 湖南大学学报: 自然科学版,http://kns.cnki.net/kcms/detail/43.1061.N.20231017.1626.004.html, 2023. WANG Huiqin, GAO Daqing, HE Yongqiang, et al. DRDU-net based denoising algorithm for GPR image dataset[J/OL]. Journal of Hunan University: Natural Sciences,http://kns.cnki.net/kcms/detail/43.1061.N.20231017.1626.004.html, 2023.
[40]	AKIYAMA H, TANAKA M, and OKUTOMI M. Pseudo four-channel image denoising for noisy CFA raw data[C]. 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, Canada, 2015: 4778–4782. doi: 10.1109/ICIP.2015.7351714.
[41]	LI Jun, YU Menghong, and YUAN Wei. Application of wavelet new threshold denoising in ship data preprocessing and modeling[C]. 2020 Chinese Automation Congress (CAC), Shanghai, China, 2020: 1263–1267. doi: 10.1109/CAC51589.2020.9326954.
[42]	WANG Feng, YANG Bo, WANG Yuqing, et al. Learning from noisy data: An unsupervised random denoising method for seismic data using model-based deep learning[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5913314. doi: 10.1109/TGRS.2022.3165037.
[43]	TIAN Chongpeng, HONG Mei, LI Dongying, et al. Deep recurrent neural network for ground-penetrating radar signal denoising[C]. 2022 4th International Conference on Intelligent Information Processing (IIP), Guangzhou, China, 2022: 85−88. doi: 10.1109/IIP57348.2022.00024.
[44]	GONG Jianhu. Denoising control method of abnormal signals in communication networks based on big data analysis[C]. 2021 13th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), Beihai, China, 2021: 329–334. doi: 10.1109/ICMTMA52658.2021.00077.
[45]	LIU Yuchi, YAO Yue, WANG Zhengjie, et al. Generalized alignment for multimodal physiological signal learning[C]. 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 2019: 1–10. doi: 10.1109/IJCNN.2019.8852216.
[46]	CHEN Liyi, LI Zhi and WANG Yijun, et al. MMEA: Entity alignment for multi-modal knowledge graph[C]. In Knowledge Science, Engineering and Management: 13th International Conference, KSEM 2020, Hangzhou, China, 2020: 134–147. doi: https://doi.org/10.1007/978-3-030-55130-8_12.
[47]	王欢, 宋丽娟, 杜方. 基于多模态知识图谱的中文跨模态实体对齐方法[J]. 计算机工程, 2023, 49(12): 88–95. doi: 10.19678/j.issn.1000-3428.0066938. WANG Huan, SONG Lijuan, and DU Fang. Chinese cross-modal entity alignment method based on multi-modal knowledge graph[J]. Computer Engineering, 2023, 49(12): 88–95. doi: 10.19678/j.issn.1000-3428.0066938.
[48]	罗俊豪, 朱焱. 用于未对齐多模态语言序列情感分析的多交互感知网络[J]. 计算机应用, 2024, 44(1): 79–85. doi: 10.11772/j.issn.1001-9081.2023060815. LUO Junhao and ZHU Yan. Multi-dynamic aware network for unaligned multimodal language sequence sentiment analysis[J]. Journal of Computer Applications, 2024, 44(1): 79–85. doi: 10.11772/j.issn.1001-9081.2023060815.
[49]	LIU Ye, QIAO Lingfeng, LU Changchong, et al. OSAN: A one-stage alignment network to unify multimodal alignment and unsupervised domain adaptation[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023: 3551–3560. doi: 10.1109/CVPR52729.2023.00346.
[50]	GAO Lei and GUAN Ling. A discriminative vectorial framework for multi-modal feature representation[J]. IEEE Transactions on Multimedia, 2022, 24: 1503–1514. doi: 10.1109/TMM.2021.3066118.
[51]	XUE Zhixiang, YU Xuchu, ZHANG Pengqiang, et al. Self-supervised feature learning and few-shot land cover classification for cross-modal remote sensing images[C]. IGARSS 2022 - 2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 2022: 3600–3603. doi: 10.1109/IGARSS46834.2022.9884265.
[52]	AL-HMOUZ R, DAQROUQ K, MORFEQ A, et al. Multimodal biometrics using multiple feature representations to speaker identification system[C]. 2015 International Conference on Information and Communication Technology Research (ICTRC), Abu Dhabi, United Arab Emirates, 2015: 314–317. doi: 10.1109/ICTRC.2015.7156485.
[53]	GAO Lei and GUAN Ling. Interpretable learning-based multi-modal hashing analysis for multi-view feature representation learning[C]. 2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR), CA, USA, 2022: 47–52. doi: 10.1109/MIPR54900.2022.00016.
[54]	WANG Shiping and GUO Wenzhong. Sparse multigraph embedding for multimodal feature representation[J]. IEEE Transactions on Multimedia, 2017, 19(7): 1454–1466. doi: 10.1109/TMM.2017.2663324.
[55]	SHEKHAR S, PATEL V M, NASRABADI N M, et al. Joint sparse representation for robust multimodal biometrics recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(1): 113–126. doi: 10.1109/TPAMI.2013.109.
[56]	HOU Xiaofeng, XU Cheng, LIU Jiacheng, et al. Characterizing and understanding end-to-end multi-modal neural networks on GPUs[J]. IEEE Computer Architecture Letters, 2022, 21(2): 125–128. doi: 10.1109/LCA.2022.3215718.
[57]	LI Shuzhen, ZHANG Tong, CHEN Bianna, et al. MIA-Net: Multi-modal interactive attention network for multi-modal affective analysis[J]. IEEE Transactions on Affective Computing, 2023, 14(4): 2796–2809. doi: 10.1109/TAFFC.2023.3259010.
[58]	LIU Bin. Robust dynamic multi-modal data fusion: A model uncertainty perspective[J]. IEEE Signal Processing Letters, 2021, 28: 2107–2111. doi: 10.1109/LSP.2021.3117731.
[59]	TIAN Pinzhuo, LI Wenbin, and GAO Yang. Consistent meta-regularization for better meta-knowledge in few-shot learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(12): 7277–7288. doi: 10.1109/TNNLS.2021.3084733.
[60]	WANG Ruiqi, ZHANG Xuyao, and LIU Chenglin. Meta-prototypical learning for domain-agnostic few-shot recognition[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(11): 6990–6996. doi: 10.1109/TNNLS.2021.3083650.
[61]	YAN Minghao. Adaptive learning knowledge networks for few-shot learning[J]. IEEE Access, 2019, 7: 119041–119051. doi: 10.1109/ACCESS.2019.2934694.
[62]	WANG Kai. An overview of deep learning based small sample medical imaging classification[C]. 2021 International Conference on Signal Processing and Machine Learning (CONF-SPML), Stanford, USA, 2021: 278–281. doi: 10.1109/CONF-SPML54095.2021.00060.
[63]	LIANG Yong, CHEN Zetao, LIN Daoqian, et al. Three-dimension attention mechanism and self-supervised pretext task for augmenting few-shot learning[J]. IEEE Access, 2023, 11: 59428–59437. doi: 10.1109/ACCESS.2023.3285721.