initial commit
Browse files
README.md
CHANGED
@@ -5,10 +5,9 @@ license: mit
|
|
5 |
|
6 |
CodeFu-7B-v0.1 is a 7B parameter model trained using Reinforcement Learning for competitive programming tasks. Built on the DeepSeek-R1-Distill-Qwen-7B base model, CodeFu is capable of algorithmic reasoning to solve complex problems and generate efficient C++ solutions.
|
7 |
|
8 |
-
|
9 |
|
10 |
-
|
11 |
-
Specicially, CodeFu-7B-v0.1 achieves **13.7% Pass@1** on the [USACO 2024 benchmark](https://princeton-nlp.github.io/USACOBench/), outperforming models >4x larger.
|
12 |
|
13 |
|
14 |
## Model Specs
|
@@ -44,7 +43,7 @@ To assess CodeFu's genuine problem-solving abilities, we used [USACO benchmark](
|
|
44 |
- ⚡ **Outperforms 32B base model** (13.7% vs 11.7% Pass@1)
|
45 |
- 📈 **>10x improvement** over 7B base model (13.7% vs 1%)
|
46 |
|
47 |
-
For systematic and robust evaluation, we used standardized code extraction logic across all model responses. This process identifies solution code by parsing either `<code></code>` tags or ```cpp code blocks, always selecting the final code block to ensure we capture each model's ultimate solution after any intermediate reasoning steps. GPT-3.5/4 scores are copied from the [USACO
|
48 |
|
49 |
All extracted code solutions are executed with **strict time limit enforcement** - any code exceeding the problem's specified time limit is marked as incorrect, ensuring realistic competitive programming conditions.
|
50 |
|
|
|
5 |
|
6 |
CodeFu-7B-v0.1 is a 7B parameter model trained using Reinforcement Learning for competitive programming tasks. Built on the DeepSeek-R1-Distill-Qwen-7B base model, CodeFu is capable of algorithmic reasoning to solve complex problems and generate efficient C++ solutions.
|
7 |
|
8 |
+
Specicially, CodeFu-7B-v0.1 achieves **13.7% Pass@1** on the [USACO benchmark](https://princeton-nlp.github.io/USACOBench/), outperforming models >4x larger.
|
9 |
|
10 |
+
Trained solely on problem statements—without access to any ground-truth solutions—CodeFu achieved >10x performance improvement over its base model, demonstrating the effectiveness of our RL approach..
|
|
|
11 |
|
12 |
|
13 |
## Model Specs
|
|
|
43 |
- ⚡ **Outperforms 32B base model** (13.7% vs 11.7% Pass@1)
|
44 |
- 📈 **>10x improvement** over 7B base model (13.7% vs 1%)
|
45 |
|
46 |
+
For systematic and robust evaluation, we used standardized code extraction logic across all model responses. This process identifies solution code by parsing either `<code></code>` tags or ```cpp code blocks, always selecting the final code block to ensure we capture each model's ultimate solution after any intermediate reasoning steps. GPT-3.5/4 scores are copied from the [USACO benchmark](https://princeton-nlp.github.io/USACOBench/) as baselines
|
47 |
|
48 |
All extracted code solutions are executed with **strict time limit enforcement** - any code exceeding the problem's specified time limit is marked as incorrect, ensuring realistic competitive programming conditions.
|
49 |
|