Thank you for this post. Very clear explanation and nice example ;)
QUANG HUY CHU
cqhofsns
·
AI & ML interests
Deep Reinforcement Learning
---
Natural Language Processing
Recent Activity
commented on
an
article
29 days ago
DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge
upvoted
an
article
about 1 month ago
Open-R1: a fully open reproduction of DeepSeek-R1
Organizations
None yet
cqhofsns's activity

commented on
DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge
29 days ago

upvoted
an
article
29 days ago
Article
DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge
By
•
•
70
upvoted
an
article
about 1 month ago
Article
Open-R1: a fully open reproduction of DeepSeek-R1
•
803

upvoted
a
collection
about 2 months ago