Date of Submission
Master of Science in Computer Science
Electrical & Computer Engineering and Computer Science
Dr. Vahid Behzadan
Dr. Mohsen Sarraf
Dr. Muhammad Aminul Islam
Adaptive discounting, Markov Decision Process, Fixed-rate discounting, State-wise Adaptive Discounting from Experience (SADE), Batch-wise Adaptive Discounting from Experience (BADE), Deep Q-Network
Reinforcement learning, Decision making, Discount
In Markov Decision Process (MDP) models of sequential decision-making, it is common practice to account for temporal discounting by incorporating a constant discount factor. While the effectiveness of fixed-rate discounting in various Reinforcement Learning (RL) settings is well-established, the efficiency of this scheme has been questioned in recent studies. Another notable shortcoming of fixed-rate discounting stems from abstracting away the experiential information of the agent, which is shown to be a significant component of delay discounting in human cognition. To address this issue, this thesis proposes a novel method for adaptive discounting entitled State-wise Adaptive Discounting from Experience (SADE). This method leverages the experiential observations of state values in episodic trajectories to iteratively adjust state-specific discount rates. We report experimental evaluations of SADE in Q-learning agents, which demonstrate significant improvements in sample complexity and convergence rate compared to fixed-rate discounting. Additionally, this thesis proposes a second adaptive discounting method for deep RL entitled Batch-wise Adaptive Discounting from Experience (BADE), and reports the experimental analyses of Deep Q-Network (DQN) agents with BADE discounting in an Atari game environment. Finally, the thesis concludes with remarks on future direction of research.
Zinzuvadiya, Milan, "Adaptive Discounting in Reinforcement Learning" (2020). Master's Theses. 169.