The search functionality is under construction.

IEICE TRANSACTIONS on Fundamentals

Learning in Two-Player Matrix Games by Policy Gradient Lagging Anchor

Shiyao DING, Toshimitsu USHIO

  • Full Text Views

    0

  • Cite this

Summary :

It is known that policy gradient algorithm can not guarantee the convergence to a Nash equilibrium in mixed policies when it is applied in matrix games. To overcome this problem, we propose a novel multi-agent reinforcement learning (MARL) algorithm called a policy gradient lagging anchor (PGLA) algorithm. And we prove that the agents' policies can converge to a Nash equilibrium in mixed policies by using the PGLA algorithm in two-player two-action matrix games. By simulation, we confirm the convergence and also show that the PGLA algorithm has a better convergence than the LR-I lagging anchor algorithm.

Publication
IEICE TRANSACTIONS on Fundamentals Vol.E102-A No.4 pp.708-711
Publication Date
2019/04/01
Publicized
Online ISSN
1745-1337
DOI
10.1587/transfun.E102.A.708
Type of Manuscript
LETTER
Category
Mathematical Systems Science

Authors

Shiyao DING
  Osaka University
Toshimitsu USHIO
  Osaka University

Keyword