Advanced Search
Volume 40 Issue 1
Jan.  2018
Turn off MathJax
Article Contents
LI Chenxi, CAO Lei, CHEN Xiliang, ZHANG Yongliang, XU Zhixiong, PENG Hui, DUAN Liwen. Cloud Reasoning Model-Based Exploration for Deep Reinforcement Learning[J]. Journal of Electronics & Information Technology, 2018, 40(1): 244-248. doi: 10.11999/JEIT170347
Citation: LI Chenxi, CAO Lei, CHEN Xiliang, ZHANG Yongliang, XU Zhixiong, PENG Hui, DUAN Liwen. Cloud Reasoning Model-Based Exploration for Deep Reinforcement Learning[J]. Journal of Electronics & Information Technology, 2018, 40(1): 244-248. doi: 10.11999/JEIT170347

Cloud Reasoning Model-Based Exploration for Deep Reinforcement Learning

doi: 10.11999/JEIT170347
Funds:

The Advanced Research of China Electronics Technology Group Corporation (6141B08010101), China Postdoctoral Science Foundation (2015T81081, 2016M602974), The Jiangsu Natural Science Foundation for Youths (BK20140075)

  • Received Date: 2017-04-18
  • Rev Recd Date: 2017-09-30
  • Publish Date: 2018-01-19
  • Reinforcement learning which has self-improving and online learning properties gets the policy of tasks through the interaction with environment. But the mechanism of trial-and-error usually leads to a large number of training episodes. Knowledge includes human experience and the cognition of environment. This paper tries to introduce the qualitative rules into the reinforcement learning, and represents these rules through the cloud reasoning model. It is used as the heuristics exploration strategy to guide the action selection. Empirical evaluation is conducted in OpenAI Gym environment called CartPole-v2 and the result shows that using exploration strategy based on the cloud reasoning model significantly enhances the performance of the learning process.
  • loading
  • MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[OL]. https://arxiv.org /abs/1312.5602v1, 2013.12.
    SUTTON R S and BARTO A G. Reinforcement Learning: An Introduction[M]. MA: MIT Press, 1998: 3-24. doi: 10.1109/ TNN.1998.712192.
    MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human- level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. doi: 10.1038/nature14236.
    OSBAND I, BLUNDELL C, PRITZEL A, et al. Deep exploration via bootstrapped DQN[C]. Proceedings of the 29th Neural Information Processing Systems, Barcelona, 2016: 4026-4034.
    BELLEMARE M, SRINIVASAN S, OSTROVSKI G, et al. Unifying count-based exploration and intrinsic motivation[C]. Proceedings of the 29th Neural Information Processing Systems, Barcelona, 2016: 1471-1479.
    HOUTHOOFT R, CHEN X, DUAN Y, et al. VIME: Variational information maximizing exploration[C]. Proceedings of the 29th Neural Information Processing Systems, Barcelona, 2016: 1109-1117.
    DAVENPORT T H, PRUSAK L, and PRUSAK L. Working Knowledge: How Organizations Manage What They Know [M]. Boston: Harvard Business School Press, 1997: 1-24. doi: 10.1145/347634.348775.
    SANTOS M and BOTELLA G. Dyna-H: A heuristic planning reinforcement learning algorithm applied to role-playing game strategy decision systems[J]. Knowledge-Based Systems, 2012, 32(8): 28-36.
    BIANCHI R A C, ROS R, and MANTARAS R L D. Improving reinforcement learning by using case based heuristics[C]. Proceedings of the International Conference on Case-Based Reasoning: Case-Based Reasoning Research and Development, Burlin, 2009: 75-89.
    KUHLMANN G, STONE P, MOONEY R, et al. Guiding a reinforcement learner with natural language advice: Initial results in RoboCup soccer[C]. Proceedings of the 19th National Conference on Artificial Intelligence Workshop on Supervisory Control of Learning and Adaptive Systems, California, 2004: 30-35.
    LI Deyi, CHEUNG D, SHI Xuemei, et al. Uncertainty reasoning based on cloud models in controllers[J]. Computers Mathematics with Applications, 1998, 35(3): 99-123.
    SINGH S P. Learning to solve Markovian decision processes [D]. [Ph.D. dissertation], University of Massachusetts, Amherst, 1994: 66-72.
    HASSELT H V, GUEZ A, and SILVER D. Deep reinforcement learning with double Q-learning[C]. Proceedings of the 30th AAAI Conference on Articial Intelligence, Phoenix, 2016: 2094-2100.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (1609) PDF downloads(266) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return