jiayi
mrzjy
AI & ML interests
Gamer. AI Engineer
Recent Activity
posted
an
update
about 7 hours ago
A very small project:
Introducing CreativeTinyZero:
https://huggingface.co/mrzjy/Qwen2.5-1.5B-GRPO-Creative-Ad-Generation
Unlike the impressive DeepSeek-R1(-Zero), this project focuses on a pure reinforcement learning (RL) experiment applied to an open-domain task: creative advertisement generation.
Objective:
- To investigate the feasibility of applying R1-like methods to an open-domain task without a verifiable ground-truth reward, while at least demonstrating its potential.
- To explore whether <think> and <answer> rewards can be explicitly designed to provide strong guidance through RL based on human prior knowledge.
Note:
- Our goal is not to induce self-reflective thinking, but to align with human thought processes purely through RL, without any supervised fine-tuning (SFT) on any constructed dataset.
Despite its small size, the resulting 1.5B-GRPO model demonstrates intriguing generative capabilities—though it's still far from perfect.
updated
a model
about 7 hours ago
mrzjy/Qwen2.5-1.5B-GRPO-Creative-Ad-Generation
published
a dataset
about 9 hours ago
mrzjy/creative-ad-prompts-zh
Organizations
mrzjy's activity
Dataset Viewer issue: JobManagerCrashedError
1
#1 opened about 1 month ago
by
mrzjy
for the dataset
2
#1 opened about 2 months ago
by
ziconghe
[bot] Conversion to Parquet
#1 opened 5 months ago
by
parquet-converter
![](https://cdn-avatars.huggingface.co/v1/production/uploads/1658495802629-61f02cf649ea1fb7363729dc.png)
[bot] Conversion to Parquet
#1 opened 7 months ago
by
parquet-converter
![](https://cdn-avatars.huggingface.co/v1/production/uploads/1658495802629-61f02cf649ea1fb7363729dc.png)