Incremental Estimation of Natural Policy Gradient with Relative Importance Weighting

Ryo IWAKI; Hiroki YOKOYAMA; Minoru ASADA

doi:10.1587/transinf.2017EDP7363

Incremental Estimation of Natural Policy Gradient with Relative Importance Weighting

Ryo IWAKI, Hiroki YOKOYAMA, Minoru ASADA

Full Text Views

0

Cite this

Summary :

The step size is a parameter of fundamental importance in learning algorithms, particularly for the natural policy gradient (NPG) methods. We derive an upper bound for the step size in an incremental NPG estimation, and propose an adaptive step size to implement the derived upper bound. The proposed adaptive step size guarantees that an updated parameter does not overshoot the target, which is achieved by weighting the learning samples according to their relative importances. We also provide tight upper and lower bounds for the step size, though they are not suitable for the incremental learning. We confirm the usefulness of the proposed step size using the classical benchmarks. To the best of our knowledge, this is the first adaptive step size method for NPG estimation.

Publication: IEICE TRANSACTIONS on Information Vol.E101-D No.9 pp.2346-2355

Publication Date: 2018/09/01

Publicized: 2018/06/01

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2017EDP7363

Type of Manuscript: PAPER

Category: Artificial Intelligence, Data Mining

Authors

Ryo IWAKI
  Osaka University
Hiroki YOKOYAMA
  Tamagawa University
Minoru ASADA
  Osaka University

Keyword

reinforcement learning, natural policy gradient, adaptive step size

Cite this

Copy

Ryo IWAKI, Hiroki YOKOYAMA, Minoru ASADA, "Incremental Estimation of Natural Policy Gradient with Relative Importance Weighting" in IEICE TRANSACTIONS on Information, vol. E101-D, no. 9, pp. 2346-2355, September 2018, doi: 10.1587/transinf.2017EDP7363.
Abstract: The step size is a parameter of fundamental importance in learning algorithms, particularly for the natural policy gradient (NPG) methods. We derive an upper bound for the step size in an incremental NPG estimation, and propose an adaptive step size to implement the derived upper bound. The proposed adaptive step size guarantees that an updated parameter does not overshoot the target, which is achieved by weighting the learning samples according to their relative importances. We also provide tight upper and lower bounds for the step size, though they are not suitable for the incremental learning. We confirm the usefulness of the proposed step size using the classical benchmarks. To the best of our knowledge, this is the first adaptive step size method for NPG estimation.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2017EDP7363/_p

Copy

@ARTICLE{e101-d_9_2346,
author={Ryo IWAKI, Hiroki YOKOYAMA, Minoru ASADA, },
journal={IEICE TRANSACTIONS on Information},
title={Incremental Estimation of Natural Policy Gradient with Relative Importance Weighting},
year={2018},
volume={E101-D},
number={9},
pages={2346-2355},
abstract={The step size is a parameter of fundamental importance in learning algorithms, particularly for the natural policy gradient (NPG) methods. We derive an upper bound for the step size in an incremental NPG estimation, and propose an adaptive step size to implement the derived upper bound. The proposed adaptive step size guarantees that an updated parameter does not overshoot the target, which is achieved by weighting the learning samples according to their relative importances. We also provide tight upper and lower bounds for the step size, though they are not suitable for the incremental learning. We confirm the usefulness of the proposed step size using the classical benchmarks. To the best of our knowledge, this is the first adaptive step size method for NPG estimation.},
keywords={},
doi={10.1587/transinf.2017EDP7363},
ISSN={1745-1361},
month={September},}

Copy

TY - JOUR
TI - Incremental Estimation of Natural Policy Gradient with Relative Importance Weighting
T2 - IEICE TRANSACTIONS on Information
SP - 2346
EP - 2355
AU - Ryo IWAKI
AU - Hiroki YOKOYAMA
AU - Minoru ASADA
PY - 2018
DO - 10.1587/transinf.2017EDP7363
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E101-D
IS - 9
JA - IEICE TRANSACTIONS on Information
Y1 - September 2018
AB - The step size is a parameter of fundamental importance in learning algorithms, particularly for the natural policy gradient (NPG) methods. We derive an upper bound for the step size in an incremental NPG estimation, and propose an adaptive step size to implement the derived upper bound. The proposed adaptive step size guarantees that an updated parameter does not overshoot the target, which is achieved by weighting the learning samples according to their relative importances. We also provide tight upper and lower bounds for the step size, though they are not suitable for the incremental learning. We confirm the usefulness of the proposed step size using the classical benchmarks. To the best of our knowledge, this is the first adaptive step size method for NPG estimation.
ER -