Advanced Search
Turn off MathJax
Article Contents
LIN Zhiping, XIAO Liang, CHEN Hongyi, XU Xiaoyu, LI Jieling. Collaborative Inference for Large Language Models Against Jamming Attacks[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250675
Citation: LIN Zhiping, XIAO Liang, CHEN Hongyi, XU Xiaoyu, LI Jieling. Collaborative Inference for Large Language Models Against Jamming Attacks[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250675

Collaborative Inference for Large Language Models Against Jamming Attacks

doi: 10.11999/JEIT250675 cstr: 32379.14.JEIT250675
Funds:  The National Natural Science Foundation of China (U21A20444), The National Key Research and Development Program of China (2023YFB3107603)
  • Received Date: 2025-07-17
  • Rev Recd Date: 2025-09-10
  • Available Online: 2025-09-15
  •   Objective  Collaborative inference with Large Language Models (LLMs) is employed to enable mobile devices to offload multi-modal data, including images, text, video, and environmental information such as temperature and humidity, to edge servers. This offloading improves the performance of inference tasks such as human–computer question answering, logical reasoning, and decision support. Jamming attacks, however, increase transmission latency and packet loss, which reduces task completion rates and slows inference. A reinforcement learning–based collaborative inference scheme is proposed to enhance inference speed, accuracy, and task completion under jamming conditions. LLMs with different sparsity levels and quantization precisions are deployed on edge servers to meet heterogeneous inference requirements across tasks.  Methods  A reinforcement learning–based collaborative inference scheme is proposed to enhance inference accuracy, speed, and task completion under jamming attacks. The scheme jointly selects the edge servers, sparsity rates and quantization levels of LLMs, as well as the transmit power and channels for data offloading, based on task type, data volume, channel gains, and received jamming power. A policy risk function is formulated to quantify the probability of inference task failure given offloading latency and packet loss rate, thereby reducing the likelihood of unsafe policy exploration. Each edge server deploys LLMs with varying sparsity rates and quantization precisions, derived from layer-wise unstructured pruning and model parameter quantization, to process token vectors of multi-modal data including images, text, video, and environmental information such as temperature and humidity. This configuration is designed to meet diverse requirements for inference accuracy and speed across different tasks. The LLM inference system is implemented with mobile devices offloading images and text to edge servers for human–computer question answering and driving decision support. The edge servers employ a vision encoder and tokenizer to transform the received sensing data into token vectors, which serve as inputs to the LLMs. Pruning and parameter quantization are applied to the foundation model LLaVA-1.5–7B, generating nine LLM variants with different sparsity rates and quantization precisions to accommodate heterogeneous inference demands.  Results and Discussions  Experiments are conducted with three vehicles offloading images (i.e., captured traffic scenes) and texts (i.e., user prompts) using a maximum transmit power of 100 mW on 5,170–5,330 MHz frequency channels. The system is evaluated against a smart jammer that applies Q-learning to block one of the 20 MHz channels within this band. The results show consistent performance gains over benchmark schemes. Faster responses and more accurate driving advice are achieved, enabled by reduced offloading latency and lower packet loss in image transmission, which allow the construction of more complete traffic scenes. Over 20 repeated runs, inference speed is improved by 20.3%, task completion rate by 14.1%, and accuracy by 12.2%. These improvements are attributed to the safe exploration strategy, which prevents performance degradation and satisfies diverse inference requirements across tasks.  Conclusions  This paper proposed a reinforcement learning–based collaborative inference scheme that jointly selects the edge servers, sparsity rates and quantization levels of LLMs, as well as the transmit power and offloading channels, to counter jamming attacks. The inference system deploys nine LLM variants with different sparsity rates and quantization precisions for human–computer question answering and driving decision support, thereby meeting heterogeneous requirements for accuracy and speed. Experimental results demonstrate that the proposed scheme provides faster responses and more reliable driving advice. Specifically, it improves inference speed by 20.3%, task completion rate by 14.1%, and accuracy by 12.2%, achieved through reduced offloading latency and packet loss compared with benchmark approaches.
  • loading
  • [1]
    任磊, 王海腾, 董家宝, 等. 工业大模型: 体系架构、关键技术与典型应用[J]. 中国科学: 信息科学, 2024, 54(11): 2606–2622. doi: 10.1360/SSI-2024-0185.

    REN Lei, WANG Haiteng, DONG Jiabao, et al. Industrial foundation model: Architecture, key technologies, and typical applications[J]. SCIENTIA SINICA Informationis, 2024, 54(11): 2606–2622. doi: 10.1360/SSI-2024-0185.
    [2]
    张青龙, 韩锐, 刘驰. 云边协同大模型块粒度重训方法[J]. 电子学报, 2025, 53(2): 287–300. doi: 10.12263/DZXB.20240518.

    ZHANG Qinglong, HAN Rui, and LIU Chi. Cloud-edge collaborative retraining of foundation models at the block granularity[J]. Acta Electronica Sinica, 2025, 53(2): 287–300. doi: 10.12263/DZXB.20240518.
    [3]
    WU Shengqiong, FEI Hao, QU Lweiji, et al. NExT-GPT: Any-to-any multimodal LLM[C]. Proceedings of the 41st International Conference on Machine Learning (ICML), Vienna, Austria, 2024: 1–37.
    [4]
    ZHOU Zixuan, NING Xuefei, HONG Ke, et al. A survey on efficient inference for large language models[EB/OL]. https://arxiv.org/abs/2404.14294, 2024.
    [5]
    DeepSeek-AI. DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning[EB/OL]. https://arxiv.org/abs/2501.12948, 2025.
    [6]
    REN Yuzheng, ZHANG Haijun, YU F R, et al. Industrial internet of things with large language models (LLMs): An intelligence-based reinforcement learning approach[J]. IEEE Transactions on Mobile Computing, 2025, 24(5): 4136–4152. doi: 10.1109/TMC.2024.3522130.
    [7]
    ZHANG Xinyuan, NIE Jiangtian, HUANG Yudong, et al. Beyond the cloud: Edge inference for generative large language models in wireless networks[J]. IEEE Transactions on Wireless Communications, 2025, 24(1): 643–658. doi: 10.1109/TWC.2024.3497923.
    [8]
    MOHAMMED T, JOE-WONG C, BABBAR R, et al. Distributed inference acceleration with adaptive DNN partitioning and offloading[C]. Proceedings of the IEEE INFOCOM 2020-IEEE Conference on Computer Communications, Toronto, Canada, 2020: 854–863. doi: 10.1109/INFOCOM41043.2020.9155237.
    [9]
    HE Ying, FANG Jingcheng, YU F R, et al. Large language models (LLMs) inference offloading and resource allocation in cloud-edge computing: An active inference approach[J]. IEEE Transactions on Mobile Computing, 2024, 23(12): 11253–11264. doi: 10.1109/TMC.2024.3415661.
    [10]
    BROWN T B, MANN B, RYDER N, et al. Language models are few-shot learners[C]. Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020: 159.
    [11]
    ALAYRAC J B, DONAHUE J, LUC P, et al. Flamingo: A visual language model for few-shot learning[C]. Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, USA, 2022: 1723.
    [12]
    KOH J Y, FRIED D, and SALAKHUTDINOV R R. Generating images with multimodal language models[C]. Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, USA, 2023: 939.
    [13]
    LIU Shilong, ZENG Zhaoyang, REN Tianhe, et al. Grounding DINO: Marrying dino with grounded pre-training for open-set object detection[C]. Proceedings of the 18th European Conference on Computer Vision, Milan, Italy, 2024: 38–55. DOI: 10.1007/978-3-031-72970-6_3.
    [14]
    XIE Gaochang, XIONG Zehui, ZHANG Xinyuan, et al. GAI-IOV: Bridging generative AI and vehicular networks for ubiquitous edge intelligence[J]. IEEE Transactions on Wireless Communications, 2024, 23(10): 12799–12814. doi: 10.1109/TWC.2024.3396276.
    [15]
    YU Zhongzhi, WANG Zheng, LI Yuhan, et al. EDGE-LLM: Enabling efficient large language model adaptation on edge devices via unified compression and adaptive layer voting[C]. Proceedings of the 61st ACM/IEEE Design Automation Conference, San Francisco, USA, 2024: 1–6. DOI: 10.1145/3649329.3658473.
    [16]
    FRANTAR E and ALISTARH D. SparseGPT: Massive language models can be accurately pruned in one-shot[C]. Proceedings of the 40th International Conference on Machine Learning, Honolulu, USA, 2023: 414.
    [17]
    FRANTAR E, ASHKBOOS S, HOEFLER T, et al. OPTQ: Accurate post-training quantization for generative pre-trained transformers[C]. Proceedings of the 40th International Conference on Machine Learning, Kigali, Rwanda, 2023: 1–16.
    [18]
    FRANTAR E, KURTIC E, and ALISTARH D. M-FAC: Efficient matrix-free approximations of second-order information[C]. Proceedings of the 35th International Conference on Neural Information Processing Systems, 2021: 1140. (查阅网上资料, 未找到本条文献出版地信息, 请确认).
    [19]
    MA Xinyin, FANG Gongfan, and WANG Xiachao. LLM-pruner: On the structural pruning of large language models[C]. Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, USA, 2023: 950.
    [20]
    CHEN M H, LIANG Ben, and DONG Min. Multi-user multi-task offloading and resource allocation in mobile cloud systems[J]. IEEE Transactions on Wireless Communications, 2018, 17(10): 6790–6805. doi: 10.1109/TWC.2018.2864559.
    [21]
    NGUYEN M D, AJIB W, ZHU Weiping, et al. Integrated user association, computation offloading, resource allocation, and UAV trajectory control against jamming for UAV-based wireless networks[J]. IEEE Transactions on Wireless Communications, 2025, 24(7): 5588–5604. doi: 10.1109/TWC.2025.3547975.
    [22]
    LV Zefang, XIAO Liang, DU Yousong, et al. Multi-agent reinforcement learning based UAV swarm communications against jamming[J]. IEEE Transactions on Wireless Communications, 2023, 22(12): 9063–9075. doi: 10.1109/TWC.2023.3268082.
    [23]
    LIN Zhiping, XIAO Liang, CHEN Hongyi, et al. Edge-assisted collaborative perception against jamming and interference in vehicular networks[J]. IEEE Transactions on Wireless Communications, 2025, 24(1): 860–874. doi: 10.1109/TWC.2024.3510601.
    [24]
    XIAO Liang, LU Xiaozhen, XU Tangwei, et al. Reinforcement learning-based mobile offloading for edge computing against jamming and interference[J]. IEEE Transactions on Communications, 2020, 68(10): 6114–6126. doi: 10.1109/TCOMM.2020.3007742.
    [25]
    QU Guanqiao, CHEN Qiyuan, WEI Wei, et al. Mobile edge intelligence for large language models: A contemporary survey[J]. IEEE Communications Surveys & Tutorials, 2025. doi: 10.1109/COMST.2025.3527641. (查阅网上资料,未找到本条文献卷期号与页码信息,请确认).
    [26]
    AGRAWAL A, KEDIA N, PANWAR A, et al. Taming throughput-latency tradeoff in LLM inference with Sarathi-Serve[C]. Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, Santa Clara, USA, 2024: 117–134.
    [27]
    GIRDHAR R, EL-NOUBY A, LIU Zhuang, et al. ImageBind one embedding space to bind them all[C]. Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 15180–15190. doi: 10.1109/CVPR52729.2023.01457.
    [28]
    SHANKAR S, ZAMFIRESCU-PEREIRA J D, HARTMANN B, et al. Who validates the validators? aligning LLM-assisted evaluation of LLM outputs with human preferences[C]. Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology, Pittsburgh, USA, 2024: 131. doi: 10.1145/3654777.3676450.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(6)  / Tables(2)

    Article Metrics

    Article views (19) PDF downloads(8) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return