The search functionality is under construction.

Author Search Result

[Author] Lei CAO(3hit)

1-3hit
  • Deep Reinforcement Learning with Sarsa and Q-Learning: A Hybrid Approach

    Zhi-xiong XU  Lei CAO  Xi-liang CHEN  Chen-xi LI  Yong-liang ZHANG  Jun LAI  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2018/05/22
      Vol:
    E101-D No:9
      Page(s):
    2315-2322

    The commonly used Deep Q Networks is known to overestimate action values under certain conditions. It's also proved that overestimations do harm to performance, which might cause instability and divergence of learning. In this paper, we present the Deep Sarsa and Q Networks (DSQN) algorithm, which can considered as an enhancement to the Deep Q Networks algorithm. First, DSQN algorithm takes advantage of the experience replay and target network techniques in Deep Q Networks to improve the stability of neural networks. Second, double estimator is utilized for Q-learning to reduce overestimations. Especially, we introduce Sarsa learning to Deep Q Networks for removing overestimations further. Finally, DSQN algorithm is evaluated on cart-pole balancing, mountain car and lunarlander control task from the OpenAI Gym. The empirical evaluation results show that the proposed method leads to reduced overestimations, more stable learning process and improved performance.

  • Reward-Based Exploration: Adaptive Control for Deep Reinforcement Learning

    Zhi-xiong XU  Lei CAO  Xi-liang CHEN  Chen-xi LI  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2018/06/18
      Vol:
    E101-D No:9
      Page(s):
    2409-2412

    Aiming at the contradiction between exploration and exploitation in deep reinforcement learning, this paper proposes “reward-based exploration strategy combined with Softmax action selection” (RBE-Softmax) as a dynamic exploration strategy to guide the agent to learn. The superiority of the proposed method is that the characteristic of agent's learning process is utilized to adapt exploration parameters online, and the agent is able to select potential optimal action more effectively. The proposed method is evaluated in discrete and continuous control tasks on OpenAI Gym, and the empirical evaluation results show that RBE-Softmax method leads to statistically-significant improvement in the performance of deep reinforcement learning algorithms.

  • A Study of Qualitative Knowledge-Based Exploration for Continuous Deep Reinforcement Learning

    Chenxi LI  Lei CAO  Xiaoming LIU  Xiliang CHEN  Zhixiong XU  Yongliang ZHANG  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2017/07/26
      Vol:
    E100-D No:11
      Page(s):
    2721-2724

    As an important method to solve sequential decision-making problems, reinforcement learning learns the policy of tasks through the interaction with environment. But it has difficulties scaling to large-scale problems. One of the reasons is the exploration and exploitation dilemma which may lead to inefficient learning. We present an approach that addresses this shortcoming by introducing qualitative knowledge into reinforcement learning using cloud control systems to represent ‘if-then’ rules. We use it as the heuristics exploration strategy to guide the action selection in deep reinforcement learning. Empirical evaluation results show that our approach can make significant improvement in the learning process.