Date of Submission

12-2020

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Electrical & Computer Engineering and Computer Science

Advisor

Dr. Vahid Behzadan

Committee Member

Dr. Mohsen Sarraf

Committee Member

Dr. Muhammad Aminul Islam

Keywords

Adaptive discounting, Markov Decision Process, Fixed-rate discounting, State-wise Adaptive Discounting from Experience (SADE), Batch-wise Adaptive Discounting from Experience (BADE), Deep Q-Network

LCSH

Reinforcement learning, Decision making, Discount

Abstract

In Markov Decision Process (MDP) models of sequential decision-making, it is common practice to account for temporal discounting by incorporating a constant discount factor. While the effectiveness of fixed-rate discounting in various Reinforcement Learning (RL) settings is well-established, the efficiency of this scheme has been questioned in recent studies. Another notable shortcoming of fixed-rate discounting stems from abstracting away the experiential information of the agent, which is shown to be a significant component of delay discounting in human cognition. To address this issue, this thesis proposes a novel method for adaptive discounting entitled State-wise Adaptive Discounting from Experience (SADE). This method leverages the experiential observations of state values in episodic trajectories to iteratively adjust state-specific discount rates. We report experimental evaluations of SADE in Q-learning agents, which demonstrate significant improvements in sample complexity and convergence rate compared to fixed-rate discounting. Additionally, this thesis proposes a second adaptive discounting method for deep RL entitled Batch-wise Adaptive Discounting from Experience (BADE), and reports the experimental analyses of Deep Q-Network (DQN) agents with BADE discounting in an Atari game environment. Finally, the thesis concludes with remarks on future direction of research.

Share

COinS