Blog posts

2026

Decoupling Time and Risk: Risk-Sensitive RL with General Discounting

5 minute read

Published:

In standard Reinforcement Learning (RL), the discount factor (\(\gamma\)) is often treated as a fixed parameter of the Markov Decision Process or a tunable hyperparameter for training stability. We typically default to exponential discounting, where the value of a reward decays by a constant factor at every time step.