Goal-Conditioned Supervised Learning
Table of Contents
- https://dibyaghosh.com/blog/rl/gcsl.html
- https://www.youtube.com/watch?v=-vMcPk2Uc8g
- paper: Learning to Reach Goals via Iterated Supervised Learning - 1912.06088v4.pdf
Authors: Dibya Ghosh, Benjamin Eysenbach, Sergey Levine
1. Any trajectory is optimal if the goal is the final state of trajectory
Any trajectory is a successful demonstration for reaching the final state in that same trajectory. (pg. 1)
2. Comparision with HER
GCSL is different from Hindsight Experience Replay. (See 00:10:33 Comparision with HER)
HER | GCSL | |
---|---|---|
Is the Goal in the Trajectory? | NO | YES |
Uses TD Learning? | YES | NO |
- Goal from Trajectory?
- Given a transition HER creates a fictitious transition by choosing an arbitrary goal and updating the reward as per the goal. The goal doesn't have to be in the trajectory
- 00:10:57 GCSL only relables the transition goal to be the final state of the trajectory
- TD Learning? 00:11:21
- HER uses TD Learning (for learning value function) which is unstable
GCSL directly learns policy using Supervised Learning: Imitation Learning is stable
So even if we replace the goal in HER to be terminal state of trajectory, learning value function is not as stable as learning policy directly
References
- https://arxiv.org/abs/1912.06088
- https://dibyaghosh.com/blog/rl/gcsl.html (Goal-Conditioned Supervised Learning)
- https://www.youtube.com/watch?v=-vMcPk2Uc8g (Goal-Conditioned Supervised Learning)