Yihua Zhang's picture

1 1 3

Yihua Zhang

NormalUhr

·

AI & ML interests

None yet

Recent Activity

published an article about 6 hours ago

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

published an article 4 days ago

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

published an article 7 days ago

A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons

View all activity

Organizations

NormalUhr's activity

published an article about 6 hours ago

Article

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

By

•

about 6 hours ago

published an article 4 days ago

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

By

•

4 days ago

• 19

published an article 7 days ago

Article

A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons

By

•

7 days ago

• 2

published an article 7 days ago

Article

From Zero to Reasoning Hero: How DeepSeek-R1 Leverages Reinforcement Learning to Master Complex Reasoning

By

•

7 days ago

• 6

published an article 7 days ago

Article

MLA: Redefining KV-Cache Through Low-Rank Projections and On-Demand Decompression

By

•

7 days ago

• 4