| Citation: | LI Chenxi, CAO Lei, CHEN Xiliang, ZHANG Yongliang, XU Zhixiong, PENG Hui, DUAN Liwen. Cloud Reasoning Model-Based Exploration for Deep Reinforcement Learning[J]. Journal of Electronics & Information Technology, 2018, 40(1): 244-248. doi: 10.11999/JEIT170347 | 
 
	                | MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[OL]. https://arxiv.org /abs/1312.5602v1, 2013.12. | 
| SUTTON R S and BARTO A G. Reinforcement Learning: An Introduction[M]. MA: MIT Press, 1998: 3-24. doi: 10.1109/ TNN.1998.712192. | 
| MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human- level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. doi:  10.1038/nature14236. | 
| OSBAND I, BLUNDELL C, PRITZEL A, et al. Deep exploration via bootstrapped DQN[C]. Proceedings of the 29th Neural Information Processing Systems, Barcelona, 2016: 4026-4034. | 
| BELLEMARE M, SRINIVASAN S, OSTROVSKI G, et al. Unifying count-based exploration and intrinsic motivation[C]. Proceedings of the 29th Neural Information Processing Systems, Barcelona, 2016: 1471-1479. | 
| HOUTHOOFT R, CHEN X, DUAN Y, et al. VIME: Variational information maximizing exploration[C]. Proceedings of the 29th Neural Information Processing Systems, Barcelona, 2016: 1109-1117. | 
| DAVENPORT T H, PRUSAK L, and PRUSAK L. Working Knowledge: How Organizations Manage What They Know [M]. Boston: Harvard Business School Press, 1997: 1-24. doi:  10.1145/347634.348775. | 
| SANTOS M and BOTELLA G. Dyna-H: A heuristic planning reinforcement learning algorithm applied to role-playing game strategy decision systems[J]. Knowledge-Based Systems, 2012, 32(8): 28-36. | 
| BIANCHI R A C, ROS R, and MANTARAS R L D. Improving reinforcement learning by using case based heuristics[C]. Proceedings of the International Conference on Case-Based Reasoning: Case-Based Reasoning Research and Development, Burlin, 2009: 75-89. | 
| KUHLMANN G, STONE P, MOONEY R, et al. Guiding a reinforcement learner with natural language advice: Initial results in RoboCup soccer[C]. Proceedings of the 19th National Conference on Artificial Intelligence Workshop on Supervisory Control of Learning and Adaptive Systems, California, 2004: 30-35. | 
| LI Deyi, CHEUNG D, SHI Xuemei, et al. Uncertainty reasoning based on cloud models in controllers[J]. Computers  Mathematics with Applications, 1998, 35(3): 99-123. | 
| SINGH S P. Learning to solve Markovian decision processes [D]. [Ph.D. dissertation], University of Massachusetts, Amherst, 1994: 66-72. | 
| HASSELT H V, GUEZ A, and SILVER D. Deep reinforcement learning with double Q-learning[C]. Proceedings of the 30th AAAI Conference on Articial Intelligence, Phoenix, 2016: 2094-2100. | 
