It is known that policy gradient algorithm can not guarantee the convergence to a Nash equilibrium in mixed policies when it is applied in matrix games. To overcome this problem, we propose a novel multi-agent reinforcement learning (MARL) algorithm called a policy gradient lagging anchor (PGLA) algorithm. And we prove that the agents' policies can converge to a Nash equilibrium in mixed policies by using the PGLA algorithm in two-player two-action matrix games. By simulation, we confirm the convergence and also show that the PGLA algorithm has a better convergence than the LR-I lagging anchor algorithm.
Shiyao DING
Osaka University
Toshimitsu USHIO
Osaka University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Shiyao DING, Toshimitsu USHIO, "Learning in Two-Player Matrix Games by Policy Gradient Lagging Anchor" in IEICE TRANSACTIONS on Fundamentals,
vol. E102-A, no. 4, pp. 708-711, April 2019, doi: 10.1587/transfun.E102.A.708.
Abstract: It is known that policy gradient algorithm can not guarantee the convergence to a Nash equilibrium in mixed policies when it is applied in matrix games. To overcome this problem, we propose a novel multi-agent reinforcement learning (MARL) algorithm called a policy gradient lagging anchor (PGLA) algorithm. And we prove that the agents' policies can converge to a Nash equilibrium in mixed policies by using the PGLA algorithm in two-player two-action matrix games. By simulation, we confirm the convergence and also show that the PGLA algorithm has a better convergence than the LR-I lagging anchor algorithm.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.E102.A.708/_p
Copy
@ARTICLE{e102-a_4_708,
author={Shiyao DING, Toshimitsu USHIO, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Learning in Two-Player Matrix Games by Policy Gradient Lagging Anchor},
year={2019},
volume={E102-A},
number={4},
pages={708-711},
abstract={It is known that policy gradient algorithm can not guarantee the convergence to a Nash equilibrium in mixed policies when it is applied in matrix games. To overcome this problem, we propose a novel multi-agent reinforcement learning (MARL) algorithm called a policy gradient lagging anchor (PGLA) algorithm. And we prove that the agents' policies can converge to a Nash equilibrium in mixed policies by using the PGLA algorithm in two-player two-action matrix games. By simulation, we confirm the convergence and also show that the PGLA algorithm has a better convergence than the LR-I lagging anchor algorithm.},
keywords={},
doi={10.1587/transfun.E102.A.708},
ISSN={1745-1337},
month={April},}
Copy
TY - JOUR
TI - Learning in Two-Player Matrix Games by Policy Gradient Lagging Anchor
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 708
EP - 711
AU - Shiyao DING
AU - Toshimitsu USHIO
PY - 2019
DO - 10.1587/transfun.E102.A.708
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E102-A
IS - 4
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - April 2019
AB - It is known that policy gradient algorithm can not guarantee the convergence to a Nash equilibrium in mixed policies when it is applied in matrix games. To overcome this problem, we propose a novel multi-agent reinforcement learning (MARL) algorithm called a policy gradient lagging anchor (PGLA) algorithm. And we prove that the agents' policies can converge to a Nash equilibrium in mixed policies by using the PGLA algorithm in two-player two-action matrix games. By simulation, we confirm the convergence and also show that the PGLA algorithm has a better convergence than the LR-I lagging anchor algorithm.
ER -