The search functionality is under construction.
The search functionality is under construction.

Author Search Result

[Author] Ruoying SUN(3hit)

1-3hit
  • Multiagent Cooperating Learning Methods by Indirect Media Communication

    Ruoying SUN  Shoji TATSUMI  Gang ZHAO  

     
    PAPER-Neural Networks and Bioengineering

      Vol:
    E86-A No:11
      Page(s):
    2868-2878

    Reinforcement Learning (RL) is an efficient learning method for solving problems that learning agents have no knowledge about the environment a priori. Ant Colony System (ACS) provides an indirect communication method among cooperating agents, which is an efficient method for solving combinatorial optimization problems. Based on the cooperating method of the indirect communication in ACS and the update policy of reinforcement values in RL, this paper proposes the Q-ACS multiagent cooperating learning method that can be applied to both Markov Decision Processes (MDPs) and combinatorial optimization problems. The advantage of the Q-ACS method is for the learning agents to share episodes beneficial to the exploitation of the accumulated knowledge and utilize the learned reinforcement values efficiently. Further, taking the visited times into account, this paper proposes the T-ACS multiagent learning method. The merit of the T-ACS method is that the learning agents share better policies beneficial to the exploration during agent's learning processes. Meanwhile, considering the Q-ACS and the T-ACS as homogeneous multiagent learning methods, in the light of indirect media communication among heterogeneous multiagent, this paper presents a heterogeneous multiagent RL method, the D-ACS that composites the learning policy of the Q-ACS and the T-ACS, and takes different updating policies of reinforcement values. The agents in our methods are given a simply cooperating way exchanging information in the form of reinforcement values updated in the common model of all agents. Owning the advantages of exploring the unknown environment actively and exploiting learned knowledge effectively, the proposed methods are able to solve both problems with MDPs and combinatorial optimization problems effectively. The results of experiments on hunter game and traveling salesman problem demonstrate that our methods perform competitively with representative methods on each domain respectively.

  • Convergence of the Q-ae Learning on Deterministic MDPs and Its Efficiency on the Stochastic Environment

    Gang ZHAO  Shoji TATSUMI  Ruoying SUN  

     
    PAPER-Algorithms and Data Structures

      Vol:
    E83-A No:9
      Page(s):
    1786-1795

    Reinforcement Learning (RL) is an efficient method for solving Markov Decision Processes (MDPs) without a priori knowledge about an environment, and can be classified into the exploitation oriented method and the exploration oriented method. Q-learning is a representative RL and is classified as an exploration oriented method. It is guaranteed to obtain an optimal policy, however, Q-learning needs numerous trials to learn it because there is not action-selecting mechanism in Q-learning. For accelerating the learning rate of the Q-learning and realizing exploitation and exploration at a learning process, the Q-ee learning system has been proposed, which uses pre-action-selector, action-selector and back propagation of Q values to improve the performance of Q-learning. But the Q-ee learning is merely suitable for deterministic MDPs, and its convergent guarantee to derive an optimal policy has not been proved. In this paper, based on discussing different exploration methods, replacing the pre-action-selector in the Q-ee learning, we introduce a method that can be used to implement an active exploration to an environment, the Active Exploration Planning (AEP), into the learning system, which we call the Q-ae learning. With this replacement, the Q-ae learning not only maintains advantages of the Q-ee learning but also is adapted to a stochastic environment. Moreover, under deterministic MDPs, this paper presents the convergent condition and its proof for an agent to obtain the optimal policy by the method of the Q-ae learning. Further, by discussions and experiments, it is shown that by adjusting the relation between the learning factor and the discounted rate, the exploration process to an environment can be controlled on a stochastic environment. And, experimental results about the exploration rate to an environment and the correct rate of learned policies also illustrate the efficiency of the Q-ae learning on the stochastic environment.

  • RTP-Q: A Reinforcement Learning System with Time Constraints Exploration Planning for Accelerating the Learning Rate

    Gang ZHAO  Shoji TATSUMI  Ruoying SUN  

     
    PAPER-Artificial Intelligence and Knowledge

      Vol:
    E82-A No:10
      Page(s):
    2266-2273

    Reinforcement learning is an efficient method for solving Markov Decision Processes that an agent improves its performance by using scalar reward values with higher capability of reactive and adaptive behaviors. Q-learning is a representative reinforcement learning method which is guaranteed to obtain an optimal policy but needs numerous trials to achieve it. k-Certainty Exploration Learning System realizes active exploration to an environment, but, the learning process is separated into two phases and estimate values are not derived during the process of identifying the environment. Dyna-Q architecture makes fuller use of a limited amount of experiences and achieves a better policy with fewer environment interactions during identifying an environment by learning and planning with constrained time, however, the exploration is not active. This paper proposes a RTP-Q reinforcement learning system which varies an efficient method for exploring an environment into time constraints exploration planning and compounds it into an integrated system of learning, planning and reacting for aiming for the best of both methods. Based on improving the performance of exploring an environment, refining the model of the environment, the RTP-Q learning system accelerates the learning rate for obtaining an optimal policy. The results of experiment on navigation tasks demonstrate that the RTP-Q learning system is efficient.