This paper describes the Profit-Sharing, a reinforcement learning approach which can be used to design a coordination strategy in a multi-agent system, and demonstrates its effectiveness empirically within a coil-yard of steel manufacture. This domain consists of multiple cranes which are operated asynchronously but need coordination to adjust their initial plans of task execution to avoid the collisions, which would be caused by resource limitation. This problem is beyond the classical expert's hand-coding methods as well as the mathematical analysis, because of scattered information, stochastically generated tasks, and moreover, the difficulties to transact tasks on schedule. In recent few years, many applications of reinforcement learning algorithms based on Dynamic Programming (DP), such as Q-learning, Temporal Difference method, are introduced. They promise optimal performance of the agent in the Markov decision processes (MDPs), but in the non-MDPs, such as multi-agent domain, there is no guarantee for the convergence of agent's policy. On the other hand, Profit-Sharing is contrastive with DP-based ones, could guarantee the convergence to the rational policy, which means that agent could reach one of the desirable status, even in non-MDPs, where agents learn concurrently and competitively. Therefore, we embedded Profit-Sharing into the operator of crane to acquire cooperative rules in such a dynamic domain, and introduce its applicability to the realistic world by means of comparing with RAP (Reactive Action Planner) model, encoded by expert's knowledge.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Sachiyo ARAI, Kazuteru MIYAZAKI, Shigenobu KOBAYASHI, "Controlling Multiple Cranes Using Multi-Agent Reinforcement Learning: Emerging Coordination among Competitive Agents" in IEICE TRANSACTIONS on Communications,
vol. E83-B, no. 5, pp. 1039-1047, May 2000, doi: .
Abstract: This paper describes the Profit-Sharing, a reinforcement learning approach which can be used to design a coordination strategy in a multi-agent system, and demonstrates its effectiveness empirically within a coil-yard of steel manufacture. This domain consists of multiple cranes which are operated asynchronously but need coordination to adjust their initial plans of task execution to avoid the collisions, which would be caused by resource limitation. This problem is beyond the classical expert's hand-coding methods as well as the mathematical analysis, because of scattered information, stochastically generated tasks, and moreover, the difficulties to transact tasks on schedule. In recent few years, many applications of reinforcement learning algorithms based on Dynamic Programming (DP), such as Q-learning, Temporal Difference method, are introduced. They promise optimal performance of the agent in the Markov decision processes (MDPs), but in the non-MDPs, such as multi-agent domain, there is no guarantee for the convergence of agent's policy. On the other hand, Profit-Sharing is contrastive with DP-based ones, could guarantee the convergence to the rational policy, which means that agent could reach one of the desirable status, even in non-MDPs, where agents learn concurrently and competitively. Therefore, we embedded Profit-Sharing into the operator of crane to acquire cooperative rules in such a dynamic domain, and introduce its applicability to the realistic world by means of comparing with RAP (Reactive Action Planner) model, encoded by expert's knowledge.
URL: https://global.ieice.org/en_transactions/communications/10.1587/e83-b_5_1039/_p
Copy
@ARTICLE{e83-b_5_1039,
author={Sachiyo ARAI, Kazuteru MIYAZAKI, Shigenobu KOBAYASHI, },
journal={IEICE TRANSACTIONS on Communications},
title={Controlling Multiple Cranes Using Multi-Agent Reinforcement Learning: Emerging Coordination among Competitive Agents},
year={2000},
volume={E83-B},
number={5},
pages={1039-1047},
abstract={This paper describes the Profit-Sharing, a reinforcement learning approach which can be used to design a coordination strategy in a multi-agent system, and demonstrates its effectiveness empirically within a coil-yard of steel manufacture. This domain consists of multiple cranes which are operated asynchronously but need coordination to adjust their initial plans of task execution to avoid the collisions, which would be caused by resource limitation. This problem is beyond the classical expert's hand-coding methods as well as the mathematical analysis, because of scattered information, stochastically generated tasks, and moreover, the difficulties to transact tasks on schedule. In recent few years, many applications of reinforcement learning algorithms based on Dynamic Programming (DP), such as Q-learning, Temporal Difference method, are introduced. They promise optimal performance of the agent in the Markov decision processes (MDPs), but in the non-MDPs, such as multi-agent domain, there is no guarantee for the convergence of agent's policy. On the other hand, Profit-Sharing is contrastive with DP-based ones, could guarantee the convergence to the rational policy, which means that agent could reach one of the desirable status, even in non-MDPs, where agents learn concurrently and competitively. Therefore, we embedded Profit-Sharing into the operator of crane to acquire cooperative rules in such a dynamic domain, and introduce its applicability to the realistic world by means of comparing with RAP (Reactive Action Planner) model, encoded by expert's knowledge.},
keywords={},
doi={},
ISSN={},
month={May},}
Copy
TY - JOUR
TI - Controlling Multiple Cranes Using Multi-Agent Reinforcement Learning: Emerging Coordination among Competitive Agents
T2 - IEICE TRANSACTIONS on Communications
SP - 1039
EP - 1047
AU - Sachiyo ARAI
AU - Kazuteru MIYAZAKI
AU - Shigenobu KOBAYASHI
PY - 2000
DO -
JO - IEICE TRANSACTIONS on Communications
SN -
VL - E83-B
IS - 5
JA - IEICE TRANSACTIONS on Communications
Y1 - May 2000
AB - This paper describes the Profit-Sharing, a reinforcement learning approach which can be used to design a coordination strategy in a multi-agent system, and demonstrates its effectiveness empirically within a coil-yard of steel manufacture. This domain consists of multiple cranes which are operated asynchronously but need coordination to adjust their initial plans of task execution to avoid the collisions, which would be caused by resource limitation. This problem is beyond the classical expert's hand-coding methods as well as the mathematical analysis, because of scattered information, stochastically generated tasks, and moreover, the difficulties to transact tasks on schedule. In recent few years, many applications of reinforcement learning algorithms based on Dynamic Programming (DP), such as Q-learning, Temporal Difference method, are introduced. They promise optimal performance of the agent in the Markov decision processes (MDPs), but in the non-MDPs, such as multi-agent domain, there is no guarantee for the convergence of agent's policy. On the other hand, Profit-Sharing is contrastive with DP-based ones, could guarantee the convergence to the rational policy, which means that agent could reach one of the desirable status, even in non-MDPs, where agents learn concurrently and competitively. Therefore, we embedded Profit-Sharing into the operator of crane to acquire cooperative rules in such a dynamic domain, and introduce its applicability to the realistic world by means of comparing with RAP (Reactive Action Planner) model, encoded by expert's knowledge.
ER -