Abstract
Deep reinforcement learning (DRL) has achieved state-of-the-art performance, especially in complex decision-making systems such as autonomous driving solutions. Due to their black-box nature, explaining the DRL agent’s decision is crucial, especially for sensitive domains. In this paper, we use the Bag-of-Pattern (BoP) method to explore the learned behaviors of DRL, where we can find high-frequent rewarded and non-rewarded behaviors along with low-frequent rewarded and non-rewarded behaviors. This exploration helps us to identify the effectiveness of the model in completing the given tasks. We use the Atari Learning Environment, the Pong game, as a test-bed. We extracted learned strategies and common behavior policies using the most frequent BoP created for each state. Results show that the agent trained with Deep Q-Network (DQN) has adopted a winning strategy by playing in a defensive mode and focusing on maximizing reward rather than exploration. The agent trained with Proximal Policy Optimization (PPO) algorithm has lowered performance by showing more variational behavior to explore states and takes frequent up and down actions to prepare incoming shots from the opponent.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 41-48 |
| Number of pages | 8 |
| Journal | CEUR Workshop Proceedings |
| Volume | 3793 |
| State | Published - 2024 |
| Externally published | Yes |
| Event | Joint of the 2nd World Conference on eXplainable Artificial Intelligence Late-Breaking Work, Demos and Doctoral Consortium, xAI-2024:LB/D/DC - Valletta, Malta Duration: Jul 17 2024 → Jul 19 2024 |
All Science Journal Classification (ASJC) codes
- General Computer Science