Comparing the Effectiveness of PPO and its Variants in Training AI to Play Game

Luobin Cui, Ying Tang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Automated game intelligence is a crucial step in rapid game development. A promising research direction for automated game intelligence is reinforcement learning, and specifically, the proximal policy optimization (PPO) algorithm. Two variants of the PPO, Maskable PPO and Recurrent PPO, further extend the capabilities of the PPO. We compare the performance of these three algorithms in the 2D game Mario and a 3D car racing game environment. We also evaluate their performance and applicability by comparing the experimental results of the original algorithm authors. With our results, we provide recommendations on PPO configuration depending on the target game type, providing future developers with a benchmark to help them decide which algorithm is most applicable for their applications.

Original languageEnglish (US)
Title of host publicationICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages521-526
Number of pages6
ISBN (Electronic)9798350312492
DOIs
StatePublished - 2023
Externally publishedYes
Event2023 International Conference on Cyber-Physical Social Intelligence, ICCSI 2023 - Xi'an, China
Duration: Oct 20 2023Oct 23 2023

Publication series

NameICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence

Conference

Conference2023 International Conference on Cyber-Physical Social Intelligence, ICCSI 2023
Country/TerritoryChina
CityXi'an
Period10/20/2310/23/23

All Science Journal Classification (ASJC) codes

  • Software
  • Safety, Risk, Reliability and Quality
  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Signal Processing

Fingerprint

Dive into the research topics of 'Comparing the Effectiveness of PPO and its Variants in Training AI to Play Game'. Together they form a unique fingerprint.

Cite this