1-3hit |
It is known that policy gradient algorithm can not guarantee the convergence to a Nash equilibrium in mixed policies when it is applied in matrix games. To overcome this problem, we propose a novel multi-agent reinforcement learning (MARL) algorithm called a policy gradient lagging anchor (PGLA) algorithm. And we prove that the agents' policies can converge to a Nash equilibrium in mixed policies by using the PGLA algorithm in two-player two-action matrix games. By simulation, we confirm the convergence and also show that the PGLA algorithm has a better convergence than the LR-I lagging anchor algorithm.
Maximizing network lifetime and optimizing aggregate system utility are important but usually conflict goals in wireless multi-hop networks. For the trade-off, we present a matrix game-theoretic cross-layer optimization formulation to jointly maximize the diverse objectives in such networks with network coding. To this end, we introduce a cross-layer formulation of general network utility maximization (NUM) that accommodates routing, scheduling, and stream control from different layers in the coded networks. Specifically, for the scheduling problem and then the objective function involved, we develop a matrix game with the strategy sets of the players corresponding to hyperlink and transmission mode, and design multiple payoffs specific to lifetime and system utility, respectively. In particular, with the inherit merit that matrix game can be solved with mathematical programming, our cross-layer programming formulation actually benefits from both game-based and NUM-based approaches at the same time by cooperating the programming model for the matrix game with that for the other layers in a consistent framework. Finally, our numerical experiments quantitatively exemplify the possible performance trad-offs with respect to the two variants developed on the multiple objectives in question while qualitatively exhibiting the differences between the framework and the other related works.
Chi GUO Li-na WANG Xiao-ying ZHANG
Network structure has a great impact both on hazard spread and network immunization. The vulnerability of the network node is associated with each other, assortative or disassortative. Firstly, an algorithm for vulnerability relevance clustering is proposed to show that the vulnerability community phenomenon is obviously existent in complex networks. On this basis, next, a new indicator called network “hyper-betweenness” is given for evaluating the vulnerability of network node. Network hyper-betweenness can reflect the importance of network node in hazard spread better. Finally, the dynamic stochastic process of hazard spread is simulated based on Monte-Carlo sampling method and a two-player, non-cooperative, constant-sum game model is designed to obtain an equilibrated network immunization strategy.