In this paper, we describe the problem of learning an optimal incentivization strategy that maximizes the service level given a fixed budget constraint for a sharing service such as bike-sharing, car-sharing, etc. in a spatiotemporal environment. The service level can be affected due to an imbalance in supply and demand at different locations during a specific time period. We describe and present our study and comparison of various reinforcement learning algorithms on a 1-D problem setting in a simulated bike-share system with a budget constraint on the incentives. We empirically study the performance of three policy gradient based reinforcement learning algorithms, namely: Proximal Policy Optimization (PPO), Trust Region Policy Optimization (TRPO), and Actor Critic using Kronecker-Factored Trust Region (ACKTR).
All Science Journal Classification (ASJC) codes
- Computational Mathematics
- Control and Optimization
- Modeling and Simulation
- Numerical Analysis