高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

有向无环图区块链辅助深度强化学习的智能驾驶策略优化算法

黄晓舸 李春磊 黎文静 梁承超 陈前斌

黄晓舸, 李春磊, 黎文静, 梁承超, 陈前斌. 有向无环图区块链辅助深度强化学习的智能驾驶策略优化算法[J]. 电子与信息学报. doi: 10.11999/JEIT240407
引用本文: 黄晓舸, 李春磊, 黎文静, 梁承超, 陈前斌. 有向无环图区块链辅助深度强化学习的智能驾驶策略优化算法[J]. 电子与信息学报. doi: 10.11999/JEIT240407
HUANG Xiaoge, LI Chunlei, LI Wenjing, LIANG Chengchao, CHEN Qianbin. An Intelligent Driving Strategy Optimization Algorithm Assisted by Direct Acyclic Graph Blockchain and Deep Reinforcement Learning[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT240407
Citation: HUANG Xiaoge, LI Chunlei, LI Wenjing, LIANG Chengchao, CHEN Qianbin. An Intelligent Driving Strategy Optimization Algorithm Assisted by Direct Acyclic Graph Blockchain and Deep Reinforcement Learning[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT240407

有向无环图区块链辅助深度强化学习的智能驾驶策略优化算法

doi: 10.11999/JEIT240407
基金项目: 国家自然科学基金(62371082, 62001076),广西科技计划(AB24010317),重庆市自然科学基金(CSTB2023NSCQ-MSX0726, cstc2020jcyj-msxmX0878)
详细信息
    作者简介:

    黄晓舸:女,博士,研究方向为移动通信技术、网络优化,区块链,物联网相关技术

    李春磊:男,硕士生,研究方向为移动通信技术、分布式学习、区块链、智能驾驶相关技术

    黎文静:女,硕士生,研究方向为移动通信技术、分布式学习、区块链、车联网相关技术

    梁承超:男,博士,教授,研究方向无线通信、空天地一体化网 络、(卫星)互联网架构与协议

    陈前斌:男,博士,教授,研究方向为新一代移动通信网络、未来网络、LTE-Advanced异构小蜂窝网络

    通讯作者:

    黄晓舸 huangxg@cqupt.edu.cn

  • 中图分类号: TN92

An Intelligent Driving Strategy Optimization Algorithm Assisted by Direct Acyclic Graph Blockchain and Deep Reinforcement Learning

Funds: The National Natural Science Foundation of China (62371082, 62001076), Guangxi Science and Technology Project (AB24010317), The Natural Science Foundation of Chongqing (CSTB2023NSCQ-MSX0726, cstc2020jcyj-msxmX0878)
  • 摘要: 深度强化学习(DRL)在智能驾驶决策中的应用日益广泛,通过与环境的持续交互,能够有效提高智能驾驶系统的决策能力。然而,DRL在实际应用中面临学习效率低和数据共享安全性差的问题。为了解决这些问题,该文提出一种基于有向无环图(DAG)区块链辅助深度强化学习的智能驾驶策略优化(D-IDSO)算法。首先,构建了基于DAG区块链的双层安全数据共享架构,以确保模型数据共享的效率和安全性。其次,设计了一个基于DRL的智能驾驶决策模型,综合考虑安全性、舒适性和高效性设定多目标奖励函数,优化智能驾驶决策。此外,提出了一种改进型优先经验回放的双延时确定策略梯度(IPER-TD3)方法,以提升训练效率。最后,在CARLA仿真平台中选取制动和变道场景对智能网联汽车(CAV)进行训练。实验结果表明,所提算法显著提高了智能驾驶场景中模型训练效率,在确保模型数据安全共享的基础上,有效提升了智能驾驶的安全性、舒适性和高效性。
  • 图  1  基于DAG区块链的双层安全数据共享车联网架构

    图  2  两种典型驾驶场景

    图  3  不同智能驾驶策略下模型训练平均奖励变化

    图  4  不同智能驾驶策略下制动模型测试

    图  5  不同智能驾驶策略下变道模型测试

    图  6  不同智能驾驶策略下变道轨迹

    图  7  CARLA仿真平台中协同变道示意图

    图  8  不同经验回放算法的平均奖励及其标准差变化

    1  基于DAG区块链辅助DRL的智能驾驶策略优化算法

     输入:Critic网络初始参数,Actor网络初始参数,本地迭代轮次
     E,学习率η,折现因子γ和更新率τ
     输出:最优CAV智能驾驶决策;
     (1) 车辆服务提供商发布任务
     (2) RSU $m$初始化网络参数,并上传至DAG区块链
     (3) for CAV $ v $=1 to V do
     (4)  CAV $ v $发送请求向量$ {\boldsymbol{\sigma}} _{v,m}^{{\text{dw}}} $
     (5)  RSU $m$发送响应向量$ {\boldsymbol{\sigma}} _{m,v}^{{\text{dw}}} $和初始模型
     (6)  //本地DRL训练
     (7)  for episode e= 1 to E do
     (8)  for step j = 1 to J do
     (9)   CAV $ v $与环境不断交互
     (10) 存储4元组训练样本$ \left\{ {{s_t},{a_t},{r_t},{s_{t{\text{ + 1}}}}} \right\} $到${B_{\text{1}}}$
     (11) if step done then
     (12) 根据式(20)计算$\bar r$
     (13) 存储5元组训练样本$ \{ {s_t},{a_t},{r_t},{s_{t{\text{ + 1}}}},\bar r\} $到${B_{\text{2}}}$
     (14) end if
     (15) 根据式(21)更新经验回放池${B_{\text{1}}}$中样本优先级
     (16) 根据式(22)更新经验回放池${B_{\text{2}}}$中样本优先级
     (17) 从${B_{\text{1}}}$,${B_{\text{2}}}$中抽样N1,N2数量的训练样本
     (18) 采用梯度下降方法更新Critic网络
     (19) if Critic网络更新2次 then
     (20) 采用梯度下降方法更新Actor网络
     (21) 采用软更新方法更新目标网络
     (22) end if
     (23) end for
     (24) //上传模型
     (25) if 模型质量$ {U_t} \ge {U_{{\text{threshold}}}} $ then
     (26) CAV $ v $发送新site, $ {\bf{TX}}_{v,m}^{{\text{dw}}} $和请求向量$ {\boldsymbol{\sigma}} _{v,m'}^{{\text{up}}} $
     (27) RSU $m'$打包交易向量,将新site添加至DAG
     (28) end if
     (29) end for
     (30) end for
    下载: 导出CSV
  • [1] XU Wenchao, ZHOU Haibo, CHENG Nan, et al. Internet of vehicles in big data era[J]. IEEE/CAA Journal of Automatica Sinica, 2018, 5(1): 19–35. doi: 10.1109/JAS.2017.7510736.
    [2] TENG Siyu, HU Xuemin, DENG Peng, et al. Motion planning for autonomous driving: The state of the art and future perspectives[J]. IEEE Transactions on Intelligent Vehicles, 2023, 8(6): 3692–3711. doi: 10.1109/TIV.2023.3274536.
    [3] LI Guofa, QIU Yifan, YANG Yifan, et al. Lane change strategies for autonomous vehicles: A deep reinforcement learning approach based on transformer[J]. IEEE Transactions on Intelligent Vehicles, 2023, 8(3): 2197–2211. doi: 10.1109/TIV.2022.3227921.
    [4] ZHU Zhuangdi, LIN Kaixiang, JAIN A K, et al. Transfer learning in deep reinforcement learning: A survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(11): 13344–13362. doi: 10.1109/TPAMI.2023.3292075.
    [5] WU Jingda, HUANG Zhiyu, HUANG Wenhui, et al. Prioritized experience-based reinforcement learning with human guidance for autonomous driving[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(1): 855–869. doi: 10.1109/TNNLS.2022.3177685.
    [6] CHEN Junlong, KANG Jiawen, XU Minrui, et al. Multiagent deep reinforcement learning for dynamic avatar migration in AIoT-Enabled vehicular metaverses with trajectory prediction[J]. IEEE Internet of Things Journal, 2024, 11(1): 70–83. doi: 10.1109/JIOT.2023.3296075.
    [7] ZOU Guangyuan, HE Ying, YU F R, et al. Multi-constraint deep reinforcement learning for smooth action control[C]. Proceedings of the 31st International Joint Conference on Artificial Intelligence, Vienna, Austria, 2022: 3802–3808. doi: 10.24963/ijcai.2022/528.
    [8] HUANG Xiaoge, WU Yuhang, LIANG Chengchao, et al. Distance-aware hierarchical federated learning in blockchain-enabled edge computing network[J]. IEEE Internet of Things Journal, 2023, 10(21): 19163–19176. doi: 10.1109/JIOT.2023.3279983.
    [9] CAO Bin, WANG Zixin, ZHANG Long, et al. Blockchain systems, technologies, and applications: A methodology perspective[J]. IEEE Communications Surveys & Tutorials, 2023, 25(1): 353–385. doi: 10.1109/COMST.2022.3204702.
    [10] HUANG Xiaoge, YIN Hongbo, CHEN Qianbin, et al. DAG-based swarm learning: A secure asynchronous learning framework for internet of vehicles[J]. Digital Communications and Networks, 2023. doi: 10.1016/j.dcan.2023.10.004. (查阅网上资料,未找到卷期页码信息,请确认补充) .
    [11] XIA Le, SUN Yao, SWASH R, et al. Smart and secure CAV networks empowered by AI-enabled blockchain: The next frontier for intelligent safe driving assessment[J]. IEEE Network, 2022, 36(1): 197–204. doi: 10.1109/MNET.101.2100387.
    [12] FU Yuchuan, LI Changle, YU F R, et al. An autonomous lane-changing system with knowledge accumulation and transfer assisted by vehicular blockchain[J]. IEEE Internet of Things Journal, 2020, 7(11): 11123–11136. doi: 10.1109/JIOT.2020.2994975.
    [13] FAN Bo, DONG Yiwei, LI Tongfei, et al. Blockchain-FRL for vehicular lane changing: Toward traffic, data, and training safety[J]. IEEE Internet of Things Journal, 2023, 10(24): 22153–22164. doi: 10.1109/JIOT.2023.3303918.
    [14] YIN Hongbo, HUANG Xiaoge, WU Yuhang, et al. Multi-region asynchronous swarm learning for data sharing in large-scale internet of vehicles[J]. IEEE Communications Letters, 2023, 27(11): 2978–2982. doi: 10.1109/LCOMM.2023.3314662.
    [15] CAO Mingrui, ZHANG Long, and CAO Bin. Toward on-device federated learning: A direct acyclic graph-based blockchain approach[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(4): 2028–2042. doi: 10.1109/TNNLS.2021.3105810.
  • 加载中
图(8) / 表(1)
计量
  • 文章访问数:  16
  • HTML全文浏览量:  5
  • PDF下载量:  1
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-05-25
  • 修回日期:  2024-11-11
  • 网络出版日期:  2024-11-19

目录

    /

    返回文章
    返回