blog | Tony Chen

Understand Policy Gradient by Building Cross Entropy from Scratch

Understanding RL, especially policy gradient, could be non-trivial, particularly if you like grasping intuitions like I do. In this post, I will walk through a thread of thoughts that really helped me understand PG by starting from more familiar supervised learning setting.

1 min read · June 11, 2023

2023