Gas Station Inspection Task Allocation Algorithm in Digital Twin-assisted Reinforcement Learning

LIAN Yuanfeng; TIAN Tian; CHEN Xiaohe; DONG Shaohua

doi:10.11999/JEIT241027

Volume 47 Issue 7

Jul. 2025

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2025 > 47(7): 2285-2297

LIAN Yuanfeng, TIAN Tian, CHEN Xiaohe, DONG Shaohua. Gas Station Inspection Task Allocation Algorithm in Digital Twin-assisted Reinforcement Learning[J]. Journal of Electronics & Information Technology, 2025, 47(7): 2285-2297. doi: 10.11999/JEIT241027

Citation:

LIAN Yuanfeng, TIAN Tian, CHEN Xiaohe, DONG Shaohua. Gas Station Inspection Task Allocation Algorithm in Digital Twin-assisted Reinforcement Learning[J]. Journal of Electronics & Information Technology, 2025, 47(7): 2285-2297. doi: 10.11999/JEIT241027

Citation:

PDF( 8665 KB)

Gas Station Inspection Task Allocation Algorithm in Digital Twin-assisted Reinforcement Learning

doi: 10.11999/JEIT241027 cstr: 32379.14.JEIT241027

LIAN Yuanfeng^{1, 2},
TIAN Tian^{1, 2},
CHEN Xiaohe^{1, 2
,
,},
DONG Shaohua^{3, 4}

1.
College of Artificial Intelligence, China University of Petroleum, Beijing 102249, China
2.
Beijing Key Laboratory of Petroleum Data Mining, China University of Petroleum, Beijing 102249, China
3.
College of Safety and Ocean Engineering, China University of Petroleum, Beijing 102249, China
4.
Key Laboratory of Oil and Gas Safety and Emergency Technology, Ministry of Emergency Management, Beijing 102249, China

Funds: Beijing Natural Science Foundation (L233002), The CNPC Innovation Found (2022DQ02-0609)

Received Date: 2024-11-18
Rev Recd Date: 2025-03-31

Available Online: 2025-04-21

Publish Date: 2025-07-22

Abstract

Abstract

Objective With the increasing quantity of equipment in gas stations and the growing demand for safety, Multi-Robot Task Allocation (MRTA) has become essential for improving inspection efficiency. Although existing MRTA algorithms offer basic allocation strategies, they have limited capacity to respond to emergent tasks and to manage energy consumption effectively. To address these limitations, this study integrates digital twin technology with a reinforcement learning framework. By incorporating Lyapunov optimization and decoupling the optimization objectives, the proposed method improves inspection efficiency while maintaining a balance between robot energy use and task delay. This approach enhances task allocation in complex gas station scenarios and provides theoretical support for intelligent unmanned management systems in such environments. Methods The DTPPO algorithm constructs a multi-objective joint optimization model for inspection task allocation, with energy consumption and task delay as the primary criteria. The model considers the execution performance of multiple robots and the characteristics of heterogeneous tasks. Lyapunov optimization theory is then applied to decouple the time-energy coupling constraints of the inspection objectives. Using the Lyapunov drift-plus-penalty framework, the algorithm balances task delay and energy consumption, which simplifies the original joint optimization problem. The decoupled objectives are solved using a strategy that combines digital twin technology with the Proximal Policy Optimization (PPO) algorithm, resulting in a task allocation policy for multi-robot inspection in gas station environments. Results and Discussions The DTPPO algorithm decouples long-term energy consumption and time constraints using Lyapunov optimization, incorporating their variations into the reward function of the reinforcement learning model. Simulation results show that the Pathfinding inspection path (Fig. 4) generated by the DTPPO algorithm improves the task completion rate by 1.94% compared to benchmark experiments. In complex gas station environments (Fig. 5), the algorithm achieves a 1.92% improvement. When the task quantity parameter is set between 0.1 and 0.5 (Fig. 8), the algorithm maintains a high task completion rate even under heavy load. With 2 to 6 robots (Fig. 9), the algorithm demonstrates strong adaptability and effectiveness in resource-constrained scenarios. Conclusions This study addresses the coupling between energy consumption and time by decoupling the objective function constraints through Lyapunov optimization. By incorporating the variation of Lyapunov drift-plus-penalty terms into the reward function of reinforcement learning, a digital twin-assisted reinforcement learning algorithm, named DTPPO, is proposed. The method is evaluated in multiple simulated environments, and the results show the following: (1) The proposed approach achieves a 1.92% improvement in task completion rate compared to the DDQN algorithm; (2) Lyapunov optimization improves performance by 5.89% over algorithms that rely solely on reinforcement learning; (3) The algorithm demonstrates good adaptability and effectiveness under varying task quantities and robot numbers. However, this study focuses solely on Lyapunov theory, and future research should explore the integration of Lyapunov optimization with other algorithms to further enhance MRTA methods.
- Gas station,
- Digital twin(DT),
- Task allocation,
- Lyapunov,
- Proximal Policy Optimization(PPO)

FullText(HTML)

References(24)

References

[1]	SHUKLA A and KARKI H. A review of robotics in onshore oil-gas industry[C]. 2013 IEEE International Conference on Mechatronics and Automation, Takamatsu, Japan, 2013: 1153–1160. doi: 10.1109/ICMA.2013.6618077.
[2]	GERKEY B P and MATARIĆ M J. A formal analysis and taxonomy of task allocation in multi-robot systems[J]. The International Journal of Robotics Research, 2004, 23(9): 939–954. doi: 10.1177/0278364904045564.
[3]	GAO Jianqi, LI Yanjie, XU Yunhong, et al. A two-objective ILP model of OP-MATSP for the multi-robot task assignment in an intelligent warehouse[J]. Applied Sciences, 2022, 12(10): 4843. doi: 10.3390/app12104843.
[4]	TOTH P and VIGO D. Vehicle Routing: Problems, Methods, and Applications[M]. 2nd ed. Philadelphia: Society for Industrial and Applied Mathematics, 2014.
[5]	DIAS M B, ZLOT R, KALRA N, et al. Market-based multirobot coordination: A survey and analysis[J]. Proceedings of the IEEE, 2006, 94(7): 1257–1270. doi: 10.1109/JPROC.2006.876939.
[6]	SCHNEIDER E, SKLAR E I, PARSONS S, et al. Auction-based task allocation for multi-robot teams in dynamic environments[C]. The 16th Annual Conference on Towards Autonomous Robotic Systems, Liverpool, UK, 2015: 246–257. doi: 10.1007/978-3-319-22416-9_29.
[7]	GHASSEMI P and CHOWDHURY S. Decentralized task allocation in multi-robot systems via bipartite graph matching augmented with fuzzy clustering[C]. International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, Quebec, Canada, 2018: V02AT03A014. doi: 10.1115/DETC2018-86161.
[8]	GHASSEMI P, DEPAUW D, and CHOWDHURY S. Decentralized dynamic task allocation in swarm robotic systems for disaster response: Extended abstract[C]. 2019 International Symposium on Multi-Robot and Multi-Agent Systems (MRS), New Brunswick, USA, 2019: 83–85. doi: 10.1109/MRS.2019.8901062.
[9]	XIAO Kun, LU Junqi, NIE Ying, et al. A benchmark for multi-UAV task assignment of an extended team orienteering problem[C]. 2022 China Automation Congress (CAC), Xiamen, China, 2022: 6966–6970. doi: 10.1109/CAC57257.2022.10054991.
[10]	KOENIG S, TOVEY C, LAGOUDAKIS M, et al. The power of sequential single-item auctions for agent coordination[C]. Proceedings of the 21st National Conference on Artificial Intelligence-Volume 2, Boston Massachusetts, USA, 2006: 1625–1629.
[11]	GAUTIER P, LAURENT J, and DIGUET J P. DQN as an alternative to Market-based approaches for Multi-Robot processing Task Allocation (MRpTA)[J]. International Journal of Robotic Computing, 2021, 3(1): 69–98. doi: 10.35708/rc1870-126266.
[12]	AGRAWAL A, BEDI A S, and MANOCHA D. RTAW: An attention inspired reinforcement learning method for multi-robot task allocation in warehouse environments[C]. 2023 IEEE International Conference on Robotics and Automation (ICRA), London, United Kingdom, 2023: 1393–1399. doi: 10.1109/ICRA48891.2023.10161310.
[13]	STRENS M and WINDELINCKX N. Combining planning with reinforcement learning for multi-robot task allocation[M]. KUDENKO D, KAZAKOV D, and ALONSO E. Adaptive Agents and Multi-Agent Systems II: Adaptation and Multi-Agent Learning. Berlin, Heidelberg: Springer, 2005: 260–274. doi: 10.1007/978-3-540-32274-0_17.
[14]	HERSI A H and UDAYAN J D. Efficient and robust multirobot navigation and task allocation using soft actor critic[J]. Procedia Computer Science, 2024, 235: 484–495. doi: 10.1016/j.procs.2024.04.048.
[15]	AZIZ H, PAL A, POURMIRI A, et al. Task allocation using a team of robots[J]. Current Robotics Reports, 2022, 3(4): 227–238. doi: 10.1007/s43154-022-00087-4.
[16]	TAO Fei, XIAO Bin, QI Qinglin, et al. Digital twin modeling[J]. Journal of Manufacturing Systems, 2022, 64: 372–389. doi: 10.1016/j.jmsy.2022.06.015.
[17]	LEE D, LEE S, MASOUD N, et al. Digital twin-driven deep reinforcement learning for adaptive task allocation in robotic construction[J]. Advanced Engineering Informatics, 2022, 53: 101710. doi: 10.1016/j.aei.2022.101710.
[18]	SHEN Gaoqing, LEI Lei, LI Zhilin, et al. Deep reinforcement learning for flocking motion of multi-UAV systems: Learn from a digital twin[J]. IEEE Internet of Things Journal, 2022, 9(13): 11141–11153. doi: 10.1109/jiot.2021.3127873.
[19]	TANG Xin, LI Xiaohuan, YU Rong, et al. Digital-twin-assisted task assignment in multi-UAV systems: A deep reinforcement learning approach[J]. IEEE Internet of Things Journal, 2023, 10(17): 15362–15375. doi: 10.1109/jiot.2023.3263574.
[20]	SHEN Xingwang, LIU Shimin, ZHOU Bin, et al. Digital twin-driven reinforcement learning method for marine equipment vehicles scheduling problem[J]. IEEE Transactions on Automation Science and Engineering, 2024, 21(3): 2173–2183. doi: 10.1109/tase.2023.3289915.
[21]	XU Zhenyu, CHANG Daofang, SUN Miaomiao, et al. Dynamic scheduling of crane by embedding deep reinforcement learning into a digital twin framework[J]. Information, 2022, 13(6): 286. doi: 10.3390/info13060286.
[22]	SHEVITZ D and PADEN B. Lyapunov stability theory of nonsmooth systems[J]. IEEE Transactions on Automatic Control, 1994, 39(9): 1910–1914. doi: 10.1109/9.317122.
[23]	SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J]. arXiv preprint arXiv: 1707.06347, 2017. doi: 10.48550/arXiv.1707.06347.
[24]	STURTEVANT N R. Benchmarks for grid-based pathfinding[J]. IEEE Transactions on Computational Intelligence and AI in Games, 2012, 4(2): 144–148. doi: 10.1109/tciaig.2012.2197681.