Collaborative Inference for Large Language Models Against Jamming Attacks

LIN Zhiping; XIAO Liang; CHEN Hongyi; XU Xiaoyu; LI Jieling

doi:10.11999/JEIT250675

Volume 47 Issue 11

Nov. 2025

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2025 > 47(11): 4572-4582

LIN Zhiping, XIAO Liang, CHEN Hongyi, XU Xiaoyu, LI Jieling. Collaborative Inference for Large Language Models Against Jamming Attacks[J]. Journal of Electronics & Information Technology, 2025, 47(11): 4572-4582. doi: 10.11999/JEIT250675

Citation:

LIN Zhiping, XIAO Liang, CHEN Hongyi, XU Xiaoyu, LI Jieling. Collaborative Inference for Large Language Models Against Jamming Attacks[J]. Journal of Electronics & Information Technology, 2025, 47(11): 4572-4582. doi: 10.11999/JEIT250675

Citation:

PDF( 5044 KB)

Collaborative Inference for Large Language Models Against Jamming Attacks

doi: 10.11999/JEIT250675 cstr: 32379.14.JEIT250675

School of Infomatics, Xiamen University, Xiamen 361102, China

Funds: The National Natural Science Foundation of China (U21A20444), The National Key Research and Development Program of China (2023YFB3107603)

Received Date: 2025-07-17
Rev Recd Date: 2025-09-09

Available Online: 2025-09-15

Publish Date: 2025-11-10

Abstract

Abstract

Objective Collaborative inference with Large Language Models (LLMs) is employed to enable mobile devices to offload multi-modal data, including images, text, video, and environmental information such as temperature and humidity, to edge servers. This offloading improves the performance of inference tasks such as human-computer question answering, logical reasoning, and decision support. Jamming attacks, however, increase transmission latency and packet loss, which reduces task completion rates and slows inference. A reinforcement learning-based collaborative inference scheme is proposed to enhance inference speed, accuracy, and task completion under jamming conditions. LLMs with different sparsity levels and quantization precisions are deployed on edge servers to meet heterogeneous inference requirements across tasks. Methods A reinforcement learning-based collaborative inference scheme is proposed to enhance inference accuracy, speed, and task completion under jamming attacks. The scheme jointly selects the edge servers, sparsity rates and quantization levels of LLMs, as well as the transmit power and channels for data offloading, based on task type, data volume, channel gains, and received jamming power. A policy risk function is formulated to quantify the probability of inference task failure given offloading latency and packet loss rate, thereby reducing the likelihood of unsafe policy exploration. Each edge server deploys LLMs with varying sparsity rates and quantization precisions, derived from layer-wise unstructured pruning and model parameter quantization, to process token vectors of multi-modal data including images, text, video, and environmental information such as temperature and humidity. This configuration is designed to meet diverse requirements for inference accuracy and speed across different tasks. The LLM inference system is implemented with mobile devices offloading images and text to edge servers for human-computer question answering and driving decision support. The edge servers employ a vision encoder and tokenizer to transform the received sensing data into token vectors, which serve as inputs to the LLMs. Pruning and parameter quantization are applied to the foundation model LLaVA-1.5-7B, generating nine LLM variants with different sparsity rates and quantization precisions to accommodate heterogeneous inference demands. Results and Discussions Experiments are conducted with three vehicles offloading images (i.e., captured traffic scenes) and texts (i.e., user prompts) using a maximum transmit power of 100 mW on 5.735～5.835 MHz frequency channels. The system is evaluated against a smart jammer that applies Q-learning to block one of the 20 MHz channels within this band. The results show consistent performance gains over benchmark schemes. Faster responses and more accurate driving advice are achieved, enabled by reduced offloading latency and lower packet loss in image transmission, which allow the construction of more complete traffic scenes. Over 20 repeated runs, inference speed is improved by 20.3%, task completion rate by 14.1%, and inference accuracy by 12.2%. These improvements are attributed to the safe exploration strategy, which prevents performance degradation and satisfies diverse inference requirements across tasks. Conclusions This paper proposed a reinforcement learning-based collaborative inference scheme that jointly selects the edge servers, sparsity rates and quantization levels of LLMs, as well as the transmit power and offloading channels, to counter jamming attacks. The inference system deploys nine LLM variants with different sparsity rates and quantization precisions for human-computer question answering and driving decision support, thereby meeting heterogeneous requirements for accuracy and speed. Experimental results demonstrate that the proposed scheme provides faster responses and more reliable driving advice. Specifically, it improves inference speed by 20.3%, task completion rate by 14.1%, and accuracy by 12.2%, achieved through reduced offloading latency and packet loss compared with benchmark approaches.
- Large Language Model (LLM),
- Multi-modal data,
- Collaborative inference,
- Reinforcement learning,
- Anti-jamming

FullText(HTML)

References(28)

References

[1]	任磊, 王海腾, 董家宝, 等. 工业大模型: 体系架构、关键技术与典型应用[J]. 中国科学: 信息科学, 2024, 54(11): 2606–2622. doi: 10.1360/SSI-2024-0185. REN Lei, WANG Haiteng, DONG Jiabao, et al. Industrial foundation model: Architecture, key technologies, and typical applications[J]. SCIENTIA SINICA Informationis, 2024, 54(11): 2606–2622. doi: 10.1360/SSI-2024-0185.
[2]	张青龙, 韩锐, 刘驰. 云边协同大模型块粒度重训方法[J]. 电子学报, 2025, 53(2): 287–300. doi: 10.12263/DZXB.20240518. ZHANG Qinglong, HAN Rui, and LIU Chi. Cloud-edge collaborative retraining of foundation models at the block granularity[J]. Acta Electronica Sinica, 2025, 53(2): 287–300. doi: 10.12263/DZXB.20240518.
[3]	WU Shengqiong, FEI Hao, QU Lweiji, et al. NExT-GPT: Any-to-any multimodal LLM[C]. The 41st International Conference on Machine Learning (ICML), Vienna, Austria, 2024: 1–37.
[4]	ZHOU Zixuan, NING Xuefei, HONG Ke, et al. A survey on efficient inference for large language models[EB/OL]. https://arxiv.org/abs/2404.14294, 2024.
[5]	DeepSeek-AI. DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning[EB/OL]. https://arxiv.org/abs/2501.12948, 2025.
[6]	REN Yuzheng, ZHANG Haijun, YU F R, et al. Industrial internet of things with large language models (LLMs): An intelligence-based reinforcement learning approach[J]. IEEE Transactions on Mobile Computing, 2025, 24(5): 4136–4152. doi: 10.1109/TMC.2024.3522130.
[7]	ZHANG Xinyuan, NIE Jiangtian, HUANG Yudong, et al. Beyond the cloud: Edge inference for generative large language models in wireless networks[J]. IEEE Transactions on Wireless Communications, 2025, 24(1): 643–658. doi: 10.1109/TWC.2024.3497923.
[8]	MOHAMMED T, JOE-WONG C, BABBAR R, et al. Distributed inference acceleration with adaptive DNN partitioning and offloading[C]. The IEEE INFOCOM 2020-IEEE Conference on Computer Communications, Toronto, Canada, 2020: 854–863. doi: 10.1109/INFOCOM41043.2020.9155237.
[9]	HE Ying, FANG Jingcheng, YU F R, et al. Large language models (LLMs) inference offloading and resource allocation in cloud-edge computing: An active inference approach[J]. IEEE Transactions on Mobile Computing, 2024, 23(12): 11253–11264. doi: 10.1109/TMC.2024.3415661.
[10]	BROWN T B, MANN B, RYDER N, et al. Language models are few-shot learners[C]. The 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020: 159.
[11]	ALAYRAC J B, DONAHUE J, LUC P, et al. Flamingo: A visual language model for few-shot learning[C]. The 36th International Conference on Neural Information Processing Systems, New Orleans, USA, 2022: 1723.
[12]	KOH J Y, FRIED D, and SALAKHUTDINOV R R. Generating images with multimodal language models[C]. The 37th International Conference on Neural Information Processing Systems, New Orleans, USA, 2023: 939.
[13]	LIU Shilong, ZENG Zhaoyang, REN Tianhe, et al. Grounding DINO: Marrying dino with grounded pre-training for open-set object detection[C]. The 18th European Conference on Computer Vision, Milan, Italy, 2024: 38–55. DOI: 10.1007/978-3-031-72970-6_3.
[14]	XIE Gaochang, XIONG Zehui, ZHANG Xinyuan, et al. GAI-IOV: Bridging generative AI and vehicular networks for ubiquitous edge intelligence[J]. IEEE Transactions on Wireless Communications, 2024, 23(10): 12799–12814. doi: 10.1109/TWC.2024.3396276.
[15]	YU Zhongzhi, WANG Zheng, LI Yuhan, et al. EDGE-LLM: Enabling efficient large language model adaptation on edge devices via unified compression and adaptive layer voting[C]. Proceedings of the 61st ACM/IEEE Design Automation Conference, San Francisco, USA, 2024: 1–6. DOI: 10.1145/3649329.3658473.
[16]	FRANTAR E and ALISTARH D. SparseGPT: Massive language models can be accurately pruned in one-shot[C]. The 40th International Conference on Machine Learning, Honolulu, USA, 2023: 414.
[17]	FRANTAR E, ASHKBOOS S, HOEFLER T, et al. OPTQ: Accurate post-training quantization for generative pre-trained transformers[C]. The 40th International Conference on Machine Learning, Kigali, Rwanda, 2023: 1–16.
[18]	FRANTAR E, KURTIC E, and ALISTARH D. M-FAC: Efficient matrix-free approximations of second-order information[C]. The 35th International Conference on Neural Information Processing Systems, 2021: 1140.
[19]	MA Xinyin, FANG Gongfan, and WANG Xiachao. LLM-pruner: On the structural pruning of large language models[C]. The 37th International Conference on Neural Information Processing Systems, New Orleans, USA, 2023: 950.
[20]	CHEN M H, LIANG Ben, and DONG Min. Multi-user multi-task offloading and resource allocation in mobile cloud systems[J]. IEEE Transactions on Wireless Communications, 2018, 17(10): 6790–6805. doi: 10.1109/TWC.2018.2864559.
[21]	NGUYEN M D, AJIB W, ZHU Weiping, et al. Integrated user association, computation offloading, resource allocation, and UAV trajectory control against jamming for UAV-based wireless networks[J]. IEEE Transactions on Wireless Communications, 2025, 24(7): 5588–5604. doi: 10.1109/TWC.2025.3547975.
[22]	LV Zefang, XIAO Liang, DU Yousong, et al. Multi-agent reinforcement learning based UAV swarm communications against jamming[J]. IEEE Transactions on Wireless Communications, 2023, 22(12): 9063–9075. doi: 10.1109/TWC.2023.3268082.
[23]	LIN Zhiping, XIAO Liang, CHEN Hongyi, et al. Edge-assisted collaborative perception against jamming and interference in vehicular networks[J]. IEEE Transactions on Wireless Communications, 2025, 24(1): 860–874. doi: 10.1109/TWC.2024.3510601.
[24]	XIAO Liang, LU Xiaozhen, XU Tangwei, et al. Reinforcement learning-based mobile offloading for edge computing against jamming and interference[J]. IEEE Transactions on Communications, 2020, 68(10): 6114–6126. doi: 10.1109/TCOMM.2020.3007742.
[25]	QU Guanqiao, CHEN Qiyuan, WEI Wei, et al. Mobile edge intelligence for large language models: A contemporary survey[J]. IEEE Communications Surveys & Tutorials, 2025. doi: 10.1109/COMST.2025.3527641.
[26]	AGRAWAL A, KEDIA N, PANWAR A, et al. Taming throughput-latency tradeoff in LLM inference with Sarathi-Serve[C]. The 18th USENIX Symposium on Operating Systems Design and Implementation, Santa Clara, USA, 2024: 117–134.
[27]	GIRDHAR R, EL-NOUBY A, LIU Zhuang, et al. ImageBind one embedding space to bind them all[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 15180–15190. doi: 10.1109/CVPR52729.2023.01457.
[28]	SHANKAR S, ZAMFIRESCU-PEREIRA J D, HARTMANN B, et al. Who validates the validators? aligning LLM-assisted evaluation of LLM outputs with human preferences[C]. Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology, Pittsburgh, USA, 2024: 131. doi: 10.1145/3654777.3676450.