Blog, Articles, and discussions

Putting RL back in RLHF

By June 12, 2024 • 100

Community Articles

view all

Old Maps, New Terrain: Updating Labour Taxonomies for the AI Era

and 1 other •

7 days ago

• 13

NVIDIA Releases 6 Million Multi-Lingual Reasoning Dataset

and 4 others •

6 days ago

• 12

NVIDIA Releases 3 Million Sample Dataset for OCR, Visual Question Answering, and Captioning Tasks

and 4 others •

15 days ago

• 67

Code a simple RAG from scratch

•

Oct 29, 2024

• 165

Supercharge Edge AI With High‑Accuracy Reasoning Using NVIDIA Nemotron Nano 2 9B

and 9 others •

8 days ago

• 21

What’s MXFP4? The 4-Bit Secret Powering OpenAI’s GPT‑OSS Models on Modest Hardware

•

18 days ago

• 18

AutoBench Third Run: Revolutionizing LLM Evaluation with Record-Breaking Scale, Accuracy, and a New Home at autobench.org

•

7 days ago

• 6

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

•

Feb 7

• 211

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

•

Feb 11

• 61

Imaginary Friends Grew Up: We Panicked

•

Apr 15

• 5

OpenEvolve: An Open Source Implementation of Google DeepMind's AlphaEvolve

•

May 20

• 38

A2A Protocol Explained

•

Jul 16

• 6

Introducing ColQwen-Omni: Retrieve in every modality

and 4 others •

Jul 17

• 68

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

By April 22, 2024 • 81

Constitutional AI with Open LLMs

By February 1, 2024 • 15

Preference Tuning LLMs with Direct Preference Optimization Methods

By January 18, 2024 • 70

The N Implementation Details of RLHF with PPO

By October 24, 2023 • 67

Finetune Stable Diffusion Models with DDPO via TRL

By September 29, 2023 guest • 17

Fine-tune Llama 2 with DPO

By August 8, 2023 • 60

StackLLaMA: A hands-on guide to train LLaMA with RLHF

By April 5, 2023 • 44

Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU

By March 9, 2023 • 62

Introducing ⚔️ AI vs. AI ⚔️ a deep reinforcement learning multi-agents competition system

By February 7, 2023 • 2

Illustrating Reinforcement Learning from Human Feedback (RLHF)

By December 9, 2022 • 324

Train your first Decision Transformer

By September 8, 2022 • 14

Proximal Policy Optimization (PPO)

By August 5, 2022 • 53

Advantage Actor Critic (A2C)

By July 22, 2022 • 5

Policy Gradient with PyTorch

By June 30, 2022

Community Articles

Old Maps, New Terrain: Updating Labour Taxonomies for the AI Era

and 1 other •

7 days ago

• 13

NVIDIA Releases 6 Million Multi-Lingual Reasoning Dataset

and 4 others •

6 days ago

• 12

NVIDIA Releases 3 Million Sample Dataset for OCR, Visual Question Answering, and Captioning Tasks

and 4 others •

15 days ago

• 67

Code a simple RAG from scratch

•

Oct 29, 2024

• 165

Supercharge Edge AI With High‑Accuracy Reasoning Using NVIDIA Nemotron Nano 2 9B

and 9 others •

8 days ago

• 21

What’s MXFP4? The 4-Bit Secret Powering OpenAI’s GPT‑OSS Models on Modest Hardware

•

18 days ago

• 18

AutoBench Third Run: Revolutionizing LLM Evaluation with Record-Breaking Scale, Accuracy, and a New Home at autobench.org

•

7 days ago

• 6

Psychology: Identity Constructs and Constraint-Based Emotion

•

3 days ago

• 6

Uncensor any LLM with abliteration

•

Jun 13, 2024

• 658

Luth: Efficient French Specialization for Small Language Models

and 1 other •

16 days ago

• 14

Post-Training Isaac GR00T N1.5 for LeRobot SO-101 Arm

and 5 others •

Jun 11

• 84

From GRPO to DAPO and GSPO: What, Why, and How

•

18 days ago

• 15

ChatML vs Harmony: Understanding the new Format from OpenAI 🔍

•

17 days ago

• 27

How To Build a News Agent with GPT-OSS, Hugging Face Inference & Gradio

•

12 days ago

• 21

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

•

Feb 7

• 211

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

•

Feb 11

• 61

Imaginary Friends Grew Up: We Panicked

•

Apr 15

• 5

OpenEvolve: An Open Source Implementation of Google DeepMind's AlphaEvolve

•

May 20

• 38

A2A Protocol Explained

•

Jul 16

• 6

Introducing ColQwen-Omni: Retrieve in every modality

and 4 others •

Jul 17

• 68

View all