jiayi

mrzjy

AI & ML interests

Gamer. AI Engineer

Recent Activity

Organizations

Blog-explorers's profile picture

mrzjy's activity

posted an update about 7 hours ago
view post
Post
291
A very small project:

Introducing CreativeTinyZero:
mrzjy/Qwen2.5-1.5B-GRPO-Creative-Ad-Generation

Unlike the impressive DeepSeek-R1(-Zero), this project focuses on a pure reinforcement learning (RL) experiment applied to an open-domain task: creative advertisement generation.

Objective:

- To investigate the feasibility of applying R1-like methods to an open-domain task without a verifiable ground-truth reward, while at least demonstrating its potential.
- To explore whether <think> and <answer> rewards can be explicitly designed to provide strong guidance through RL based on human prior knowledge.

Note:
- Our goal is not to induce self-reflective thinking, but to align with human thought processes purely through RL, without any supervised fine-tuning (SFT) on any constructed dataset.

Despite its small size, the resulting 1.5B-GRPO model demonstrates intriguing generative capabilities—though it's still far from perfect.