Chinese Semantic Communication System Based on Word-level and Sentence-level Semantics

DENG Jiewen; ZHAO Haitao; WEI Jibo; CAO Kuo; ZHANG Yichi; LUO Peng; ZHANG Yuyuan; LIU Yueling

doi:10.11999/JEIT250137

Volume 47 Issue 8

Aug. 2025

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2025 > 47(8): 2549-2562

DENG Jiewen, ZHAO Haitao, WEI Jibo, CAO Kuo, ZHANG Yichi, LUO Peng, ZHANG Yuyuan, LIU Yueling. Chinese Semantic Communication System Based on Word-level and Sentence-level Semantics[J]. Journal of Electronics & Information Technology, 2025, 47(8): 2549-2562. doi: 10.11999/JEIT250137

Citation:

DENG Jiewen, ZHAO Haitao, WEI Jibo, CAO Kuo, ZHANG Yichi, LUO Peng, ZHANG Yuyuan, LIU Yueling. Chinese Semantic Communication System Based on Word-level and Sentence-level Semantics[J]. Journal of Electronics & Information Technology, 2025, 47(8): 2549-2562. doi: 10.11999/JEIT250137

Citation:

DENG Jiewen, ZHAO Haitao, WEI Jibo, CAO Kuo, ZHANG Yichi, LUO Peng, ZHANG Yuyuan, LIU Yueling. Chinese Semantic Communication System Based on Word-level and Sentence-level Semantics[J]. Journal of Electronics & Information Technology, 2025, 47(8): 2549-2562. doi: 10.11999/JEIT250137

PDF( 3188 KB)

Chinese Semantic Communication System Based on Word-level and Sentence-level Semantics

doi: 10.11999/JEIT250137 cstr: 32379.14.JEIT250137

College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China

Funds: The National Natural Science Foundation of China (62201584, 61931020)

Received Date: 2025-03-10
Rev Recd Date: 2025-07-04

Available Online: 2025-07-26

Publish Date: 2025-08-27

Abstract

Abstract

Objective To address the mismatch between limited communication resources and growing service demands, semantic communication—a novel paradigm—has been proposed and is expected to offer an effective solution. Unlike traditional approaches that focus on accurate symbol transmission, semantic communication operates at the semantic level, aiming to convey intended meaning by leveraging shared background knowledge at both the transmitter and receiver. Advances in semantic information theory provide a theoretical basis for this paradigm, while the development of artificial intelligence techniques for semantic extraction and understanding supports practical system implementation. Most existing semantic communication systems for textual data are based on English corpora; however, Chinese text differs markedly in word segmentation, lexical annotation, and syntactic structure. Systems tailored for Chinese corpora remain underexplored. Furthermore, current lexical code-based systems primarily focus on word-level semantics and fail to fully capture sentence-level semantics. This study addresses these limitations by mining and processing lexical and contextual semantics specific to Chinese text. A semantic communication system is proposed that uses Chinese corpora to learn and extract both word-level and sentence-level semantic associations. Lexical coding is performed at the transmitter, and joint context decoding is realized at the receiver, thereby improving the effectiveness and reliability of the communication process. Methods A Chinese semantic communication system is designed to capture both word-level and sentence-level semantics, leveraging the unique characteristics of Chinese text to enable efficient and reliable transmission of meaning. At the transmitter, a lexical coding method is proposed that encodes words based on their combined lexical semantic features. At the receiver, a two-stage decoding process is implemented. First, the Continuous Bag-of-Words (CBOW) model is used to learn word-level semantics from shared knowledge, estimating the conditional probability of the next word based on preceding words. Second, the Bidirectional Encoder Representations from Transformers (BERT) model is applied to capture sentence-level semantics, using Chinese characters as the fundamental processing unit to compute the probability distribution of words at each position in the sentence. Upon receiving the bit sequence, Huffman decoding is performed with a candidate code list mechanism to generate a set of candidate words. A recursive memoization algorithm then selects the most probable words based on word-level semantics. Finally, sentence-level semantics are applied to correct potential errors in the sentence, producing the recovered text. Results and Discussions The proposed semantic communication system improves effectiveness by encoding combined phrases during lexical coding, thereby reducing the number of coding objects. Reliability is enhanced by leveraging contextual associations during feature learning and joint decoding. For effectiveness, the average code length of the Huffman coding dictionary is 10.61, while the lexical coding dictionary for four categories achieves an average of 8.98. This represents an 18.15% increase in average coding rate. Experiments conducted on 100 randomly selected texts across different corpus sizes yield consistent results (Table 3, Fig. 5), validating the effectiveness of lexical coding. For reliability, system performance is first evaluated under varying parameter settings. The optimal values for context window size, lexical category count, and Hamming distance threshold are identified (Figs. 6～10). Comparative analysis across different systems is then conducted. Under an AWGN channel, the lexical+word-level+sentence-level semantic system achieves higher BLEU scores than the Huffman-only system when the Signal-to-Noise Ratio (SNR) is ≤6 dB, and matches the performance of DeepSC between –3 dB and 3 dB. At SNR ≥9 dB, its BLEU scores are slightly lower than those of the Huffman-only system but significantly higher than those of DeepSC. Across all SNR ranges, the lexical+word-level+sentence-level system outperforms the lexical+word-level system. The BLEU scores of the Huffman+word-level and Huffman+sentence-level systems are similar and consistently exceed those of the Huffman-only system. Similar trends are observed on Rayleigh and Rician fading channels and with METEOR scores (Figs. 11, 12). These results indicate that combining word-level and sentence-level semantics with a candidate set mechanism for joint context decoding substantially enhances transmission reliability at the receiver. Conclusions A Chinese semantic communication system based on word-level and sentence-level semantics is proposed. First, a lexical grouping and coding method based on LAC segmentation is developed by analyzing lexical features in Chinese text, which improves the effectiveness of the communication system. Second, the receiver models context co-occurrence probabilities by extracting word-level and sentence-level semantic features, enabling joint decoding through word selection and sentence-level error correction, thereby enhancing reliability. Simulation results show that the average code length of the Huffman coding dictionary is 10.61, while the lexical coding dictionary for four categories achieves an average of 8.98, resulting in an 18.15% increase in coding rate. On the AWGN channel, the proposed lexical+word-level+sentence-level system outperforms the Huffman-only system at low SNR and the DeepSC system at high SNR. The Huffman+word-level and Huffman+sentence-level systems yield similar reliability scores, both consistently higher than the Huffman-only system. These findings confirm that incorporating both word-level and sentence-level semantics significantly enhances system reliability.
- Semantic communication,
- Lexical coding,
- Joint context decoding,
- Word-level semantics,
- Sentence-level semantics

FullText(HTML)

References(25)

References

[1]	SHANNON C E. A mathematical theory of communication[J]. The Bell System Technical Journal, 1948, 27(3): 379–423. doi: 10.1002/j.1538-7305.1948.tb01338.x.
[2]	WEAVER W. Recent contributions to the mathematical theory of communication[J]. ETC: A Review of General Semantics, 1953, 10(4): 261–281.
[3]	徐文伟, 张弓, 白铂, 等. 后香农时代ICT领域的十大挑战问题[J]. 中国科学: 数学, 2021, 51(7): 1095–1138. doi: 10.1360/SSM-2021-0013. XU Wenwei, ZHANG Gong, BAI Bo, et al. Ten key ICT challenges in the post-Shannon era[J]. SCIENTIA SINICA Mathematica, 2021, 51(7): 1095–1138. doi: 10.1360/SSM-2021-0013.
[4]	CARNAP R and BAR-HILLEL Y. An outline of a theory of semantic information[R]. Technical Report No. 247, 1952.
[5]	FLORIDI L. Outline of a theory of strongly semantic information[J]. Minds and Machines, 2004, 14(2): 197–221. doi: 10.1023/B:MIND.0000021684.50925.c9.
[6]	NIU Kai and ZHANG Ping. A mathematical theory of semantic communication[J]. Journal on Communications, 2024, 45(6): 7–59. doi: 10.11959/j.issn.1000-436x.2024111.
[7]	OUYANG Long, WU J, JIANG Xu, et al. Training language models to follow instructions with human feedback[J]. arXiv: 2203.02155, 2022. doi: 10.48550/arXiv.2203.02155.
[8]	RAO M, FARSAD N, and GOLDSMITH A. Variable length joint source-channel coding of text using deep neural networks[C]. 2018 IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications, Kalamata, Greece, 2018: 1–5. doi: 10.1109/SPAWC.2018.8445924.
[9]	XIE Huiqiang, QIN Zhijin, LI G Y, et al. Deep learning enabled semantic communication systems[J]. IEEE Transactions on Signal Processing, 2021, 69: 2663–2675. doi: 10.1109/TSP.2021.3071210.
[10]	XIE Huiqiang and QIN Zhijin. A lite distributed semantic communication system for internet of things[J]. IEEE Journal on Selected Areas in Communications, 2021, 39(1): 142–153. doi: 10.1109/JSAC.2020.3036968.
[11]	张亦弛, 张平, 魏急波, 等. 面向智能体的语义通信: 架构与范例[J]. 中国科学(信息科学), 2022, 52(5): 907–921. doi: 10.1360/SSI-2020-0379. ZHANG Yichi, ZHANG Ping, WEI Jibo, et al. Semantic communication for intelligent devices: Architectures and a paradigm[J]. SCIENTIA SINICA Informationis, 2022, 52(5): 907–921. doi: 10.1360/SSI-2020-0379.
[12]	ZHANG Yichi, ZHAO Haitao, WEI Jibo, et al. Context-based semantic communication via dynamic programming[J]. IEEE Transactions on Cognitive Communications and Networking, 2022, 8(3): 1453–1467. doi: 10.1109/TCCN.2022.3173056.
[13]	罗鹏, 刘月玲, 张聿远, 等. 高效融合全局和局部上下文特征的语义通信系统[J]. 通信学报, 2023, 44(7): 14–25. doi: 10.11959/j.issn.1000-436x.2023133. LUO Peng, LIU Yueling, ZHANG Yuyuan, et al. Semantic communication system with efficient integration of global and local context features[J]. Journal on Communications, 2023, 44(7): 14–25. doi: 10.11959/j.issn.1000-436x.2023133.
[14]	LUO Peng, ZHAO Haitao, CAO Kuo, et al. Emotion-aided semantic communication system for reliable semantic recovery under low SNR[J]. IEEE Communications Letters, 2024, 28(3): 503–507. doi: 10.1109/LCOMM.2024.3352559.
[15]	ZHANG Yuyuan, ZHAO Haitao, CAO Kuo, et al. Layered semantic communication system for dynamic scenarios[J]. IEEE Signal Processing Letters, 2024, 31: 2525–2529. doi: 10.1109/LSP.2024.3415967.
[16]	LIU Chuanhong, GUO Caili, YANG Yang, et al. OFDM-based digital semantic communication with importance awareness[J]. IEEE Transactions on Communications, 2024, 72(10): 6301–6315. doi: 10.1109/TCOMM.2024.3397862.
[17]	BO Yufei, DUAN Yiheng, SHAO Shuo, et al. Joint coding-modulation for digital semantic communications via variational autoencoder[J]. IEEE Transactions on Communications, 2024, 72(9): 5626–5640. doi: 10.1109/TCOMM.2024.3386577.
[18]	LI Yishen, CHEN Xuechen, DENG Xiaoheng, et al. Content adaptive distributed joint source-channel coding for image transmission with hyperprior[J]. IEEE Transactions on Cognitive Communications and Networking, 2025, 11(1): 105–117. doi: 10.1109/TCCN.2024.3438371.
[19]	WU Haotian, SHAO Yulin, BIAN Chenghong, et al. Deep joint source-channel coding for adaptive image transmission over MIMO channels[J]. IEEE Transactions on Wireless Communications, 2024, 23(10): 15002–15017. doi: 10.1109/TWC.2024.3422794.
[20]	JIAO Zhenyu, SUN Shuqi, and SUN Ke. Chinese lexical analysis with deep BI-GRU-CRF network[J]. arXiv: 1807.01882, 2018. doi: 10.48550/arXiv.1807.01882.
[21]	RONG Xin. Word2vec parameter learning explained[J]. arXiv: 1411.2738, 2016. doi: 10.48550/arXiv.1411.2738.
[22]	DEVLIN J, CHANG Mingwei, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, USA, 2018: 4171–4186.
[23]	CUI Yiming, CHE Wanxiang, LIU Ting, et al. Pre-training with whole word masking for Chinese BERT[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3504–3514. doi: 10.1109/TASLP.2021.3124365.
[24]	PAPINENI K, ROUKOS S, WARD T, et al. BLEU: A method for automatic evaluation of machine translation[C]. The 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, USA, 2002: 311–318. doi: 10.3115/1073083.1073135.
[25]	BANERJEE S and LAVIE A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments[C]. ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, USA, 2005: 65–72.