Energy Harvesting Assisted Intelligent Computation Offloading Method for the IoT in Mining
摘要: 针对计算、能量和内存资源受限的矿山物联网设备和大量时延敏感型计算任务需求的智慧矿山场景,该文提出一种能量收集(EH)辅助的矿山物联网智能计算卸载方法。通过采用移动边缘计算(MEC)技术协助矿山物联网设备任务计算,同时利用能量收集技术为能量受限的矿山物联网设备供电。基于Q-learning的智能计算卸载机制实现在不可精确获取矿井系统模型的情况下动态探索最优计算卸载策略。此外,为处理复杂矿井环境下的维度灾难问题并减小策略离散化导致的离散化误差,该文还提出一种基于深度确定性策略梯度(DDPG)的计算卸载机制来进一步提高井下任务计算卸载性能。理论分析与仿真结果表明所提机制可降低系统的能量损耗、计算时延和任务处理失败率,有助于保障矿山物联网的安全和高效生产。
- 矿山物联网 /
- 移动边缘计算 /
- 能量收集 /
- Q-learning /
- 深度确定性策略梯度(DDPG)
Abstract: This paper proposes an Energy Harvesting (EH)-assisted mining IoT intelligent computation offloading method for the mine IoT device with limited computing, energy, and memory resources and smart mining scenario with a large number of latency-sensitive computation tasks. Mobile Edge Computing (MEC) technology is used to assist task computing of mine IoT devices, and EH technology is investigated to power energy-constrained mine IoT devices. The intelligent computation offloading mechanism based on Q-learning can dynamically explore and optimize computation offloading policy under the condition of an unknown precise mine system model. In addition, a computation offloading mechanism based on Deep Deterministic Policy Gradient (DDPG) is proposed. The curse of dimensionality in complex mining scenarios is resolved, the discretization error caused by policy discretization is reduced, and the computation offloading performance is further improved. Theoretical analysis and simulation results verify that the proposed mechanism can reduce energy consumption, computing delays, and task failure rate. This helps ensure safety and improve the production efficiency of IoT in mining. -
算法1 基于Q-learning的计算卸载机制 初始化系统参数$\alpha $, $\gamma $, $B_1^0, \cdots ,B_M^0$ ,${g^{(1)}}$, ${b^0}$;设置学习迭代次数 1: For $k = 1,2,3, \cdots $do 2: 预测能量收集的产能${g^{(k)}}$,观察得到${b^{(k)}}$和$B_1^{(k - 1)}, \cdots ,B_M^{(k - 1)}$ 3: 得到状态${{\boldsymbol{s}}^{(k)} } = [B_1^{(k - 1)}, \cdots ,B_M^{(k - 1)},{g^{(k)} },{b^{(k)} }]$ 4: 基于$\varepsilon {\text{ - greedy}}$选择计算卸载策略${{\boldsymbol{a}}^{(k)} } = [{i^{(k)} },{x^{(k)} }]$,并卸载任务量$ {R^{(k)}}{x^{(k)}} $至边缘服务器$i$ 5: 与环境交互,得到电量变化${b^{(k + 1)}}{\text{ = }}\max \{ {b^{(k)}} - {E^{(k)}} + {g^{(k)}},0\} $,计算评估${E^{(k)}}$, ${T^{(k)}}$和${I}({b^{(k + 1)} } = 0)$ 6: 通过式(7)获得效益${U^{(k)}}$,通过式(8)更新Q值$Q({{\boldsymbol{s}}^{(k)} },{{\boldsymbol{a}}^{(k)} })$ 7: End 算法2 基于DDPG的计算卸载机制 初始化学习参数$\alpha $,$\lambda $,${\xi _1}$,${\xi _2}$,$Z$,$\kappa $; OU噪声$\mathcal{N}$的参数;设置学习迭代次数 1: 观测得到${g^{(k)}}$, ${b^{(k)}}$, $B_1^{(k - 1)}, \cdots ,B_M^{(k - 1)}$,组成状态$ {s^{(k)}} = [B_1^{(k - 1)}, \cdots ,B_M^{(k - 1)},{g^{(k)}},{b^{(k)}}] $ 2: For $k = 1,2,3, \cdots $ do 3: 基于状态$ {s^{(k)}} $输出确定性动作,加入噪声$\mathcal{N}$后,产生卸载策略${{\boldsymbol{a}}^{(k)} } = \mu ({s^{(k)} }|{\xi _2}^{(k)}) + \mathcal{N}$ 4: 卸载任务量$ {R^{(k)}}{x^{(k)}} $至边缘服务器$i$,评估得到${s^{(k + 1)}}$和${U^{(k)}}$ 5: 将$({{\boldsymbol{s}}^{(k)} },{{\boldsymbol{a}}^{(k)} },{U^{(k)} },{s^{(k + 1)} })$存入经验池 6: 从经验池中抽取$Z$组学习经验$({s_h},{a_h},{U_h},{s_{h + 1}}),h \in [1,Z]$,更新网络参数 7: 由式(10)和式(11)更新Critic网络参数${\xi _1}$,由式(12)更新Actor网络参数${\xi _2}$ 8: 由式(13)对Target网络参数${\xi _1}^\prime ,{\xi _2}^\prime $进行软更新 9: ${{\boldsymbol{s}}^{(k)} }{\text{ = } }{{\boldsymbol{s}}^{(k + 1)} }$ 10: End for 表 1 DDRLOM机制的超参数设置
参数 数值 Actor/Critic学习率 0.0001/0.002 折扣因子 0.9 OU噪声均值/标准差/均值回归速度 5/0.015/40 batch-size 64 激活函数 Leaky ReLU 卷积网络隐藏层数/隐藏的单元数 2层/32,16 -
