Decoupling Time and Risk: Risk-Sensitive RL with General Discounting
Published:
In standard Reinforcement Learning (RL), the discount factor (\(\gamma\)) is often treated as a fixed parameter of the Markov Decision Process or a tunable hyperparameter for training stability. We typically default to exponential discounting, where the value of a reward decays by a constant factor at every time step.
