Multiagent Reinforcement Learning Based on Negotiation in General-Sum Games
ZHANG Hua-xiang1, ZHAO Tong2, HUANG Shang-teng3(1. Information and Management School, Shandong Normal Univ, Ji'nan 250014, China;2. Dept. of Automatic Control, Qingdao Univ. of Science and Technology, Qingdao 266061;3. Dept. of Computer Science and Eng., Shanghai Jiaotong Univ., Shanghai 200030)
In general-sum games, multiagent cooperation has no global objective, and only individual rationality is concerned. Agent's learning is based on the assumption of opponents' policies, and this assumption may be wrong. By defining the global objective of agents, a novel multiagent reinforcement learning algorithm was proposed. All agents selected negotiated policies during learning, and punished those agents deviating from negotiated policies to ensure the execution of these policies. It was proved that the learned Q values on each stage games converge under certain restrictions. An example was given to analyze the proven result.
【CateGory Index】： TP18