Use Bag-of-Patterns Approach to Explore Learned Behaviors of Reinforcement Learning

Gulsum Alicioglu, Bo Sun

Research output: Contribution to journalConference articlepeer-review

Abstract

Deep reinforcement learning (DRL) has achieved state-of-the-art performance, especially in complex decision-making systems such as autonomous driving solutions. Due to their black-box nature, explaining the DRL agent’s decision is crucial, especially for sensitive domains. In this paper, we use the Bag-of-Pattern (BoP) method to explore the learned behaviors of DRL, where we can find high-frequent rewarded and non-rewarded behaviors along with low-frequent rewarded and non-rewarded behaviors. This exploration helps us to identify the effectiveness of the model in completing the given tasks. We use the Atari Learning Environment, the Pong game, as a test-bed. We extracted learned strategies and common behavior policies using the most frequent BoP created for each state. Results show that the agent trained with Deep Q-Network (DQN) has adopted a winning strategy by playing in a defensive mode and focusing on maximizing reward rather than exploration. The agent trained with Proximal Policy Optimization (PPO) algorithm has lowered performance by showing more variational behavior to explore states and takes frequent up and down actions to prepare incoming shots from the opponent.

Original languageEnglish (US)
Pages (from-to)41-48
Number of pages8
JournalCEUR Workshop Proceedings
Volume3793
StatePublished - 2024
Externally publishedYes
EventJoint of the 2nd World Conference on eXplainable Artificial Intelligence Late-Breaking Work, Demos and Doctoral Consortium, xAI-2024:LB/D/DC - Valletta, Malta
Duration: Jul 17 2024Jul 19 2024

All Science Journal Classification (ASJC) codes

  • General Computer Science

Fingerprint

Dive into the research topics of 'Use Bag-of-Patterns Approach to Explore Learned Behaviors of Reinforcement Learning'. Together they form a unique fingerprint.

Cite this