The search functionality is under construction.
The search functionality is under construction.

Labeling Q-Learning in POMDP Environments

Haeyeon LEE, Hiroyuki KAMAYA, Kenichi ABE

  • Full Text Views

    0

  • Cite this

Summary :

This paper presents a new Reinforcement Learning (RL) method, called "Labeling Q-learning (LQ-learning)," to solve the partially obervable Markov Decision Process (POMDP) problems. Recently, hierarchical RL methods are widely studied. However, they have the drawback that the learning time and memory are exhausted only for keeping the hierarchical structure, though they wouldn't be necessary. On the other hand, our LQ-learning has no hierarchical structure, but adopts a new type of internal memory mechanism. Namely, in the LQ-learning, the agent percepts the current state by pair of observation and its label, and then, the agent can distinguish states, which look as same, but obviously different, more exactly. So to speak, at each step t, we define a new type of perception of its environment õt=(ott), where ot is conventional observation, and θt is the label attached to the observation ot. Then the classical RL-algorithm is used as if the pair (ott) serves as a Markov state. This labeling is carried out by a Boolean variable, called "CHANGE," and a hash-like or mod function, called Labeling Function (LF). In order to demonstrate the efficiency of LQ-learning, we will apply it to "maze problems" in Grid-Worlds, used in many literatures as POMDP simulated environments. By using the LQ-learning, we can solve the maze problems without initial knowledge of environments.

Publication
IEICE TRANSACTIONS on Information Vol.E85-D No.9 pp.1425-1432
Publication Date
2002/09/01
Publicized
Online ISSN
DOI
Type of Manuscript
PAPER
Category
Biocybernetics, Neurocomputing

Authors

Keyword