基于云推理模型的深度强化学习探索策略研究

李晨溪; 曹雷; 陈希亮; 张永亮; 徐志雄; 彭辉; 段理文

doi:10.11999/JEIT170347

基于云推理模型的深度强化学习探索策略研究

doi: 10.11999/JEIT170347

1.
(解放军理工大学指挥信息系统学院南京 210007)
2.
(浙江大学机械工程学院杭州 310027)

基金项目:

中电集团重点预研基金(6141B08010101)，中国博士后科学基金(2015T81081, 2016M602974)，江苏省自然科学青年基金(BK20140075)

计量
- 文章访问数: 1619
- HTML全文浏览量: 252
- PDF下载量: 266
- 被引次数: 0
出版历程
- 收稿日期: 2017-04-18
- 修回日期: 2017-09-30
- 刊出日期: 2018-01-19

Cloud Reasoning Model-Based Exploration for Deep Reinforcement Learning

1.
(Institute of Command Information System, PLA University of Science and Technology, Nanjing 210007, China)
2.
(College of Mechanical Engineering, Zhejiang University, Hangzhou 310027, China)

Funds:

The Advanced Research of China Electronics Technology Group Corporation (6141B08010101), China Postdoctoral Science Foundation (2015T81081, 2016M602974), The Jiangsu Natural Science Foundation for Youths (BK20140075)

摘要

摘要: 强化学习通过与环境的交互学得任务的决策策略，具有自学习与在线学习的特点。但交互试错的机制也往往导致了算法的运行效率较低、收敛速度较慢。知识包含了人类经验和对事物的认知规律，利用知识引导智能体(agent)的学习，是解决上述问题的一种有效方法。该文尝试将定性规则知识引入到强化学习中，通过云推理模型对定性规则进行表示，将其作为探索策略引导智能体的动作选择，以减少智能体在状态-动作空间探索的盲目性。该文选用OpenAI Gym作为测试环境，通过在自定义的CartPole-v2中的实验，验证了提出的基于云推理模型探索策略的有效性，可以提高强化学习的学习效率，加快收敛速度。
- 云推理 /
- 深度强化学习 /
- 知识 /
- 探索策略
Abstract: Reinforcement learning which has self-improving and online learning properties gets the policy of tasks through the interaction with environment. But the mechanism of trial-and-error usually leads to a large number of training episodes. Knowledge includes human experience and the cognition of environment. This paper tries to introduce the qualitative rules into the reinforcement learning, and represents these rules through the cloud reasoning model. It is used as the heuristics exploration strategy to guide the action selection. Empirical evaluation is conducted in OpenAI Gym environment called CartPole-v2 and the result shows that using exploration strategy based on the cloud reasoning model significantly enhances the performance of the learning process.
- Cloud reasoning /
- Deep reinforcement learning /
- Knowledge /
- Exploration strategy

HTML全文

参考文献(13)

MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[OL]. https://arxiv.org /abs/1312.5602v1, 2013.12.

SUTTON R S and BARTO A G. Reinforcement Learning: An Introduction[M]. MA: MIT Press, 1998: 3-24. doi: 10.1109/ TNN.1998.712192.

MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human- level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. doi: 10.1038/nature14236.

OSBAND I, BLUNDELL C, PRITZEL A, et al. Deep exploration via bootstrapped DQN[C]. Proceedings of the 29th Neural Information Processing Systems, Barcelona, 2016: 4026-4034.

BELLEMARE M, SRINIVASAN S, OSTROVSKI G, et al. Unifying count-based exploration and intrinsic motivation[C]. Proceedings of the 29th Neural Information Processing Systems, Barcelona, 2016: 1471-1479.

HOUTHOOFT R, CHEN X, DUAN Y, et al. VIME: Variational information maximizing exploration[C]. Proceedings of the 29th Neural Information Processing Systems, Barcelona, 2016: 1109-1117.

DAVENPORT T H, PRUSAK L, and PRUSAK L. Working Knowledge: How Organizations Manage What They Know [M]. Boston: Harvard Business School Press, 1997: 1-24. doi: 10.1145/347634.348775.

SANTOS M and BOTELLA G. Dyna-H: A heuristic planning reinforcement learning algorithm applied to role-playing game strategy decision systems[J]. Knowledge-Based Systems, 2012, 32(8): 28-36.

BIANCHI R A C, ROS R, and MANTARAS R L D. Improving reinforcement learning by using case based heuristics[C]. Proceedings of the International Conference on Case-Based Reasoning: Case-Based Reasoning Research and Development, Burlin, 2009: 75-89.

KUHLMANN G, STONE P, MOONEY R, et al. Guiding a reinforcement learner with natural language advice: Initial results in RoboCup soccer[C]. Proceedings of the 19th National Conference on Artificial Intelligence Workshop on Supervisory Control of Learning and Adaptive Systems, California, 2004: 30-35.

LI Deyi, CHEUNG D, SHI Xuemei, et al. Uncertainty reasoning based on cloud models in controllers[J]. Computers Mathematics with Applications, 1998, 35(3): 99-123.

SINGH S P. Learning to solve Markovian decision processes [D]. [Ph.D. dissertation], University of Massachusetts, Amherst, 1994: 66-72.

HASSELT H V, GUEZ A, and SILVER D. Deep reinforcement learning with double Q-learning[C]. Proceedings of the 30th AAAI Conference on Articial Intelligence, Phoenix, 2016: 2094-2100.

施引文献

资源附件(0)

访问统计