Policy Gradient Algorithm in Two-Player Zero-Sum Markov Games
LI Yongqiang;ZHOU Jian;FENG Yu;FENG Yuanjing;College of Information Engineering, Zhejiang University of Technology;
In two-player zero-sum Markov games, the traditional policy gradient theorem is only applied to alternate training of two players due to the influence of one player′s policy on the other player′s policy. To train two players at the same time, the policy gradient theorem in two-player zero-sum Markov games is proposed. Then, based on the policy gradient theorem, an extra-gradient based REINFORCE algorithm is proposed to achieve approximate Nash convergence of the joint policy of two players. The superiority of the proposed algorithm is analyzed in multiple dimensions. Firstly, the comparative experiments on simultaneous-move game show that the convergence and convergence speed of the proposed algorithm are better. Secondly, the characteristics of the joint policy obtained by the proposed algorithm are analyzed and these joint policies are verified to achieve approximate Nash equilibrium. Finally, the comparative experiments on simultaneous-move game with different difficulty levels show that the proposed algorithm holds a good convergence speed at higher difficulty levels.
CAJViewer7.0 supports all the CNKI file formats; AdobeReader only supports the PDF format.