chenwuml commited on
Commit
b4ec84a
·
1 Parent(s): 2be1969

initial commit

Browse files
Files changed (1) hide show
  1. README.md +3 -4
README.md CHANGED
@@ -5,10 +5,9 @@ license: mit
5
 
6
  CodeFu-7B-v0.1 is a 7B parameter model trained using Reinforcement Learning for competitive programming tasks. Built on the DeepSeek-R1-Distill-Qwen-7B base model, CodeFu is capable of algorithmic reasoning to solve complex problems and generate efficient C++ solutions.
7
 
8
- Trained solely on problem statements—without access to any ground-truth solutions—CodeFu achieved >10x performance improvement over its base model, demonstrating the effectiveness of our Reinforcement Learning (RL) approach in building algorithmic reasoning capabilities.
9
 
10
-
11
- Specicially, CodeFu-7B-v0.1 achieves **13.7% Pass@1** on the [USACO 2024 benchmark](https://princeton-nlp.github.io/USACOBench/), outperforming models >4x larger.
12
 
13
 
14
  ## Model Specs
@@ -44,7 +43,7 @@ To assess CodeFu's genuine problem-solving abilities, we used [USACO benchmark](
44
  - ⚡ **Outperforms 32B base model** (13.7% vs 11.7% Pass@1)
45
  - 📈 **>10x improvement** over 7B base model (13.7% vs 1%)
46
 
47
- For systematic and robust evaluation, we used standardized code extraction logic across all model responses. This process identifies solution code by parsing either `<code></code>` tags or ```cpp code blocks, always selecting the final code block to ensure we capture each model's ultimate solution after any intermediate reasoning steps. GPT-3.5/4 scores are copied from the [USACO 2024 benchmark](https://princeton-nlp.github.io/USACOBench/) as baselines
48
 
49
  All extracted code solutions are executed with **strict time limit enforcement** - any code exceeding the problem's specified time limit is marked as incorrect, ensuring realistic competitive programming conditions.
50
 
 
5
 
6
  CodeFu-7B-v0.1 is a 7B parameter model trained using Reinforcement Learning for competitive programming tasks. Built on the DeepSeek-R1-Distill-Qwen-7B base model, CodeFu is capable of algorithmic reasoning to solve complex problems and generate efficient C++ solutions.
7
 
8
+ Specicially, CodeFu-7B-v0.1 achieves **13.7% Pass@1** on the [USACO benchmark](https://princeton-nlp.github.io/USACOBench/), outperforming models >4x larger.
9
 
10
+ Trained solely on problem statements—without access to any ground-truth solutions—CodeFu achieved >10x performance improvement over its base model, demonstrating the effectiveness of our RL approach..
 
11
 
12
 
13
  ## Model Specs
 
43
  - ⚡ **Outperforms 32B base model** (13.7% vs 11.7% Pass@1)
44
  - 📈 **>10x improvement** over 7B base model (13.7% vs 1%)
45
 
46
+ For systematic and robust evaluation, we used standardized code extraction logic across all model responses. This process identifies solution code by parsing either `<code></code>` tags or ```cpp code blocks, always selecting the final code block to ensure we capture each model's ultimate solution after any intermediate reasoning steps. GPT-3.5/4 scores are copied from the [USACO benchmark](https://princeton-nlp.github.io/USACOBench/) as baselines
47
 
48
  All extracted code solutions are executed with **strict time limit enforcement** - any code exceeding the problem's specified time limit is marked as incorrect, ensuring realistic competitive programming conditions.
49