Date of Submission

12-2020

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Electrical & Computer Engineering and Computer Science

Advisor

Dr. Vahid Behzadan

Committee Member

Dr. Mohsen Sarraf

Committee Member

Dr. Muhammad Aminul Islam

Keywords

Adaptive discounting, Markov Decision Process, Fixed-rate discounting, State-wise Adaptive Discounting from Experience (SADE), Batch-wise Adaptive Discounting from Experience (BADE), Deep Q-Network

LCSH

Reinforcement learning, Decision making, Discount

Abstract

In Markov Decision Process (MDP) models of sequential decision-making, it is common practice to account for temporal discounting by incorporating a constant discount factor. While the effectiveness of fixed-rate discounting in various Reinforcement Learning (RL) settings is well-established, the efficiency of this scheme has been questioned in recent studies. Another notable shortcoming of fixed-rate discounting stems from abstracting away the experiential information of the agent, which is shown to be a significant component of delay discounting in human cognition. To address this issue, this thesis proposes a novel method for adaptive discounting entitled State-wise Adaptive Discounting from Experience (SADE). This method leverages the experiential observations of state values in episodic trajectories to iteratively adjust state-specific discount rates. We report experimental evaluations of SADE in Q-learning agents, which demonstrate significant improvements in sample complexity and convergence rate compared to fixed-rate discounting. Additionally, this thesis proposes a second adaptive discounting method for deep RL entitled Batch-wise Adaptive Discounting from Experience (BADE), and reports the experimental analyses of Deep Q-Network (DQN) agents with BADE discounting in an Atari game environment. Finally, the thesis concludes with remarks on future direction of research.

Available for download on Saturday, February 19, 2022

Share

COinS