aws-prototyping
/

codefu-7b-v0.1

Model card Files Files and versions

chenwuml commited on 4 days ago

Commit

b4ec84a

·

1 Parent(s): 2be1969

initial commit

Files changed (1) hide show

README.md +3 -4

README.md CHANGED Viewed

@@ -5,10 +5,9 @@ license: mit
 CodeFu-7B-v0.1 is a 7B parameter model trained using Reinforcement Learning for competitive programming tasks. Built on the DeepSeek-R1-Distill-Qwen-7B base model, CodeFu is capable of algorithmic reasoning to solve complex problems and generate efficient C++ solutions.
-Trained solely on problem statements—without access to any ground-truth solutions—CodeFu achieved >10x performance improvement over its base model, demonstrating the effectiveness of our Reinforcement Learning (RL) approach in building algorithmic reasoning capabilities.
-Specicially, CodeFu-7B-v0.1 achieves **13.7% Pass@1** on the [USACO 2024 benchmark](https://princeton-nlp.github.io/USACOBench/), outperforming models >4x larger.
 ## Model Specs
@@ -44,7 +43,7 @@ To assess CodeFu's genuine problem-solving abilities, we used [USACO benchmark](
 - ⚡ **Outperforms 32B base model** (13.7% vs 11.7% Pass@1)
 - 📈 **>10x improvement** over 7B base model (13.7% vs 1%)
-For systematic and robust evaluation, we used standardized code extraction logic across all model responses. This process identifies solution code by parsing either `<code></code>` tags or ```cpp code blocks, always selecting the final code block to ensure we capture each model's ultimate solution after any intermediate reasoning steps. GPT-3.5/4 scores are copied from the [USACO 2024 benchmark](https://princeton-nlp.github.io/USACOBench/) as baselines
 All extracted code solutions are executed with **strict time limit enforcement** - any code exceeding the problem's specified time limit is marked as incorrect, ensuring realistic competitive programming conditions.

 CodeFu-7B-v0.1 is a 7B parameter model trained using Reinforcement Learning for competitive programming tasks. Built on the DeepSeek-R1-Distill-Qwen-7B base model, CodeFu is capable of algorithmic reasoning to solve complex problems and generate efficient C++ solutions.
+Specicially, CodeFu-7B-v0.1 achieves **13.7% Pass@1** on the [USACO benchmark](https://princeton-nlp.github.io/USACOBench/), outperforming models >4x larger.
+Trained solely on problem statements—without access to any ground-truth solutions—CodeFu achieved >10x performance improvement over its base model, demonstrating the effectiveness of our RL approach..
 ## Model Specs
 - ⚡ **Outperforms 32B base model** (13.7% vs 11.7% Pass@1)
 - 📈 **>10x improvement** over 7B base model (13.7% vs 1%)
+For systematic and robust evaluation, we used standardized code extraction logic across all model responses. This process identifies solution code by parsing either `<code></code>` tags or ```cpp code blocks, always selecting the final code block to ensure we capture each model's ultimate solution after any intermediate reasoning steps. GPT-3.5/4 scores are copied from the [USACO benchmark](https://princeton-nlp.github.io/USACOBench/) as baselines
 All extracted code solutions are executed with **strict time limit enforcement** - any code exceeding the problem's specified time limit is marked as incorrect, ensuring realistic competitive programming conditions.