Learning in Two-Player Matrix Games by Policy Gradient Lagging Anchor

Shiyao DING; Toshimitsu USHIO

doi:10.1587/transfun.E102.A.708

IEICE TRANSACTIONS on Fundamentals

Learning in Two-Player Matrix Games by Policy Gradient Lagging Anchor

Shiyao DING, Toshimitsu USHIO

Full Text Views

0

Cite this

Summary :

It is known that policy gradient algorithm can not guarantee the convergence to a Nash equilibrium in mixed policies when it is applied in matrix games. To overcome this problem, we propose a novel multi-agent reinforcement learning (MARL) algorithm called a policy gradient lagging anchor (PGLA) algorithm. And we prove that the agents' policies can converge to a Nash equilibrium in mixed policies by using the PGLA algorithm in two-player two-action matrix games. By simulation, we confirm the convergence and also show that the PGLA algorithm has a better convergence than the L_R-I lagging anchor algorithm.

Publication: IEICE TRANSACTIONS on Fundamentals Vol.E102-A No.4 pp.708-711

Publication Date: 2019/04/01

Publicized

Online ISSN: 1745-1337

DOI: 10.1587/transfun.E102.A.708

Type of Manuscript: LETTER

Category: Mathematical Systems Science

Authors

Shiyao DING
Osaka University
Toshimitsu USHIO
Osaka University

Keyword

reinforcement learning, policy gradient, multi-agent systems, matrix game

Cite this

Copy

Shiyao DING, Toshimitsu USHIO, "Learning in Two-Player Matrix Games by Policy Gradient Lagging Anchor" in IEICE TRANSACTIONS on Fundamentals, vol. E102-A, no. 4, pp. 708-711, April 2019, doi: 10.1587/transfun.E102.A.708.
Abstract: It is known that policy gradient algorithm can not guarantee the convergence to a Nash equilibrium in mixed policies when it is applied in matrix games. To overcome this problem, we propose a novel multi-agent reinforcement learning (MARL) algorithm called a policy gradient lagging anchor (PGLA) algorithm. And we prove that the agents' policies can converge to a Nash equilibrium in mixed policies by using the PGLA algorithm in two-player two-action matrix games. By simulation, we confirm the convergence and also show that the PGLA algorithm has a better convergence than the L_R-I lagging anchor algorithm.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.E102.A.708/_p

Copy

@ARTICLE{e102-a_4_708,
author={Shiyao DING, Toshimitsu USHIO, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Learning in Two-Player Matrix Games by Policy Gradient Lagging Anchor},
year={2019},
volume={E102-A},
number={4},
pages={708-711},
abstract={It is known that policy gradient algorithm can not guarantee the convergence to a Nash equilibrium in mixed policies when it is applied in matrix games. To overcome this problem, we propose a novel multi-agent reinforcement learning (MARL) algorithm called a policy gradient lagging anchor (PGLA) algorithm. And we prove that the agents' policies can converge to a Nash equilibrium in mixed policies by using the PGLA algorithm in two-player two-action matrix games. By simulation, we confirm the convergence and also show that the PGLA algorithm has a better convergence than the L_R-I lagging anchor algorithm.},
keywords={},
doi={10.1587/transfun.E102.A.708},
ISSN={1745-1337},
month={April},}

Copy

TY - JOUR
TI - Learning in Two-Player Matrix Games by Policy Gradient Lagging Anchor
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 708
EP - 711
AU - Shiyao DING
AU - Toshimitsu USHIO
PY - 2019
DO - 10.1587/transfun.E102.A.708
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E102-A
IS - 4
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - April 2019
AB - It is known that policy gradient algorithm can not guarantee the convergence to a Nash equilibrium in mixed policies when it is applied in matrix games. To overcome this problem, we propose a novel multi-agent reinforcement learning (MARL) algorithm called a policy gradient lagging anchor (PGLA) algorithm. And we prove that the agents' policies can converge to a Nash equilibrium in mixed policies by using the PGLA algorithm in two-player two-action matrix games. By simulation, we confirm the convergence and also show that the PGLA algorithm has a better convergence than the L_R-I lagging anchor algorithm.
ER -

IEICE TRANSACTIONS on Fundamentals