The search functionality is under construction.

The search functionality is under construction.

It is known that policy gradient algorithm can not guarantee the convergence to a Nash equilibrium in mixed policies when it is applied in matrix games. To overcome this problem, we propose a novel multi-agent reinforcement learning (MARL) algorithm called a policy gradient lagging anchor (PGLA) algorithm. And we prove that the agents' policies can converge to a Nash equilibrium in mixed policies by using the PGLA algorithm in two-player two-action matrix games. By simulation, we confirm the convergence and also show that the PGLA algorithm has a better convergence than the *L*_{R-I} lagging anchor algorithm.

- Publication
- IEICE TRANSACTIONS on Fundamentals Vol.E102-A No.4 pp.708-711

- Publication Date
- 2019/04/01

- Publicized

- Online ISSN
- 1745-1337

- DOI
- 10.1587/transfun.E102.A.708

- Type of Manuscript
- LETTER

- Category
- Mathematical Systems Science

Shiyao DING

Osaka University

Toshimitsu USHIO

Osaka University

The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.

Copy

Shiyao DING, Toshimitsu USHIO, "Learning in Two-Player Matrix Games by Policy Gradient Lagging Anchor" in IEICE TRANSACTIONS on Fundamentals,
vol. E102-A, no. 4, pp. 708-711, April 2019, doi: 10.1587/transfun.E102.A.708.

Abstract: It is known that policy gradient algorithm can not guarantee the convergence to a Nash equilibrium in mixed policies when it is applied in matrix games. To overcome this problem, we propose a novel multi-agent reinforcement learning (MARL) algorithm called a policy gradient lagging anchor (PGLA) algorithm. And we prove that the agents' policies can converge to a Nash equilibrium in mixed policies by using the PGLA algorithm in two-player two-action matrix games. By simulation, we confirm the convergence and also show that the PGLA algorithm has a better convergence than the *L*_{R-I} lagging anchor algorithm.

URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.E102.A.708/_p

Copy

@ARTICLE{e102-a_4_708,

author={Shiyao DING, Toshimitsu USHIO, },

journal={IEICE TRANSACTIONS on Fundamentals},

title={Learning in Two-Player Matrix Games by Policy Gradient Lagging Anchor},

year={2019},

volume={E102-A},

number={4},

pages={708-711},

abstract={It is known that policy gradient algorithm can not guarantee the convergence to a Nash equilibrium in mixed policies when it is applied in matrix games. To overcome this problem, we propose a novel multi-agent reinforcement learning (MARL) algorithm called a policy gradient lagging anchor (PGLA) algorithm. And we prove that the agents' policies can converge to a Nash equilibrium in mixed policies by using the PGLA algorithm in two-player two-action matrix games. By simulation, we confirm the convergence and also show that the PGLA algorithm has a better convergence than the *L*_{R-I} lagging anchor algorithm.},

keywords={},

doi={10.1587/transfun.E102.A.708},

ISSN={1745-1337},

month={April},}

Copy

TY - JOUR

TI - Learning in Two-Player Matrix Games by Policy Gradient Lagging Anchor

T2 - IEICE TRANSACTIONS on Fundamentals

SP - 708

EP - 711

AU - Shiyao DING

AU - Toshimitsu USHIO

PY - 2019

DO - 10.1587/transfun.E102.A.708

JO - IEICE TRANSACTIONS on Fundamentals

SN - 1745-1337

VL - E102-A

IS - 4

JA - IEICE TRANSACTIONS on Fundamentals

Y1 - April 2019

AB - It is known that policy gradient algorithm can not guarantee the convergence to a Nash equilibrium in mixed policies when it is applied in matrix games. To overcome this problem, we propose a novel multi-agent reinforcement learning (MARL) algorithm called a policy gradient lagging anchor (PGLA) algorithm. And we prove that the agents' policies can converge to a Nash equilibrium in mixed policies by using the PGLA algorithm in two-player two-action matrix games. By simulation, we confirm the convergence and also show that the PGLA algorithm has a better convergence than the *L*_{R-I} lagging anchor algorithm.

ER -