chenwuml commited on 4 days ago

Commit

2cf8efe

1 Parent(s): 22c1d75

initial commit

Browse files

Files changed (17) hide show

.gitattributes +1 -5
README.md +250 -0
codefu-7b-v0.1_usaco.csv.tgz +3 -0
codefu_arch_01_small.png +3 -0
config.json +31 -0
example_response_hay_bales.txt +188 -0
generation_config.json +9 -0
model-00001-of-00004.safetensors +3 -0
model-00002-of-00004.safetensors +3 -0
model-00003-of-00004.safetensors +3 -0
model-00004-of-00004.safetensors +3 -0
model.safetensors.index.json +346 -0
resp_len_bsz_256_fix.png +3 -0
resp_len_bsz_416_issue.png +3 -0
resp_len_bsz_8_issue.png +3 -0
special_tokens_map.json +23 -0
tokenizer_config.json +195 -0

.gitattributes CHANGED Viewed

@@ -33,10 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
-*.json filter=lfs diff=lfs merge=lfs -text
 *.png filter=lfs diff=lfs merge=lfs -text
-*.jpg filter=lfs diff=lfs merge=lfs -text
-*.jpeg filter=lfs diff=lfs merge=lfs -text
-*.pdf filter=lfs diff=lfs merge=lfs -text
 *.tar.gz filter=lfs diff=lfs merge=lfs -text
-*.csv.tgz filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text
 *.png filter=lfs diff=lfs merge=lfs -text
 *.tar.gz filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,253 @@
 ---
 license: mit
 ---

 ---
 license: mit
 ---
+# CodeFu-7B-v0.1
+CodeFu-7B-v0.1 is a 7B parameter model trained using Reinforcement Learning for competitive programming tasks. Built on the DeepSeek-R1-Distill-Qwen-7B base model, CodeFu is capable of algorithmic reasoning to solve complex problems and generate efficient C++ solutions.
+Trained solely on problem statements—without access to any ground-truth solutions—CodeFu achieved >10x performance improvement over its base model, demonstrating the effectiveness of our Reinforcement Learning (RL) approach in building algorithmic reasoning capabilities.
+Specicially, CodeFu-7B-v0.1 achieves **13.7% Pass@1** on the [USACO 2024 benchmark](https://princeton-nlp.github.io/USACOBench/), outperforming models >4x larger.
+## Model Specs
+- **Base Model**: [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)
+- **Model Size**: 7.61B parameters
+- **License**: MIT
+- **Task**: Competitive Programming / Algorithmic Problem Solving
+Starting from the [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) model, we trained CodeFu using RL on selected Competitive Programming problems (without solutions) from the [DeepMind CodeContest](https://huggingface.co/datasets/deepmind/code_contests) dataset.
+## Evaluation
+To assess CodeFu's genuine problem-solving abilities, we used [USACO benchmark](https://princeton-nlp.github.io/USACOBench/), which consists of 307 high-quality problems from the past [USA Computing Olympiad](https://usaco.org/) contests.
+For systematic and robust evaluation:
+1. We used standardized code extraction logic across all model responses. This process identifies solution code by parsing either <code></code> tags or ```cpp code blocks, always selecting the final code block to ensure we capture each model's ultimate solution after any intermediate reasoning steps.
+2. All solutions are executed with **strict time limit enforcement** - any code exceeding the problem's specified time limit is marked as incorrect, ensuring realistic competitive programming conditions.
+3. All open-source models (including CodeFu-7B-v0.1) were tested using [vLLM](https://github.com/vllm-project/vllm) v0.6.3 with identical sampling parameters: a `temperature` of 0.8 and a `top_p` of 0.95. Claude-3.7-Sonnet was evaluated at a `temperature` of 1.0. We set the maximum output length (`max_tokens`) to 28,672 for all models to ensure sufficient length for reasoning and code solutions.
+Pass@1 results of GPT-4 and GPT-3.5 are copied from the [USACO 2024 benchmark](https://princeton-nlp.github.io/USACOBench/) as performance baselines.
+The table below compares CodeFu's performance to other reasoning/coding models:
+| Model | Size | USACO Pass@1 | Notes |
+|-------|------|-------------:|-------|
+| Claude-3.7-Sonnet | UNK | 31.9 |  |
+| [OlympicCoder-32B](https://huggingface.co/open-r1/OlympicCoder-32B) | 32B | 18.9 | |
+| [QwQ-32B](https://huggingface.co/Qwen/QwQ-32B) | 32B | 17.3 | |
+| [Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) | 32B | 16.3| |
+| **CodeFu-7B-v0.1** | **7B** | **13.7** |  |
+| [DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | 32B | 11.7 | |
+| [OlympicCoder-7B](https://huggingface.co/open-r1/OlympicCoder-7B) | 7B | 9.1 | |
+| GPT-4-1106-preview | UNK | 8.7 | |
+| [Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) | 7B | 5.9 | |
+| [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) | 7B | 1.0 | *Base model* |
+| GPT-3.5-turbo-1106 | UNK | 0.6 | |
+**Codefu Key Highlights:**
+- 📊 **Leading 7B model** on USACO benchmark
+- ⚡ **Outperforms 32B base model** (13.7% vs 11.7% Pass@1)
+- 📈 **>10x improvement** over 7B base model (13.7% vs 1%)
+### Result analysis
+We provide access to the complete CodeFu-7B-v0.1 evaluation results on the USACO benchmark as a [CSV file](codefu-7b-v0.1_usaco.csv.tgz) containing fields such as 'problem_name', 'prompt', 'response', 'response_length', 'solution_code', 'status', and 'score'.
+Notably, the 'status' distribution shows:
+- Success: 42 cases
+- Failure (code runs but incorrect): 37 cases
+- Fail to compile: 8 cases
+- No code: 220 cases
+Analysis of the response length distribution shows that successful solutions typically have concise responses around 5,000 tokens, while unsuccessful attempts often reach the maximum token limit. Many of these long responses correspond to the "No code" category, where the model engages in extensive reasoning that eventually degenerates into repetitive patterns or incoherent text without producing executable code. More future work is needed to address this long output issue. Future work is needed to improve training objectives that better distinguish between useful deliberation and unproductive verbosity.
+## Usage
+```python
+# CodeFu works with vLLM for inference
+# pip install vllm==0.6.3
+from vllm import LLM, SamplingParams
+model_name = "aws-prototyping/codefu-7b-v0.1"
+# Initialize vLLM
+llm = LLM(model=model_name, trust_remote_code=True)
+sampling_params = SamplingParams(
+    temperature=0.8,
+    top_p=0.95,
+    max_tokens=28672,
+)
+# The `Hay Bales` problem in USA Computing Olympiad benchmark
+prompt = """In your role as an algorithmic problem-solver, write a C++ solution for this problem. Put your thought process in <think> tags and your solution in <code> tags.
+Problem:
+Problem 1: Hay Bales [Brian Dean, 2011]
+The cows are at it again!  Farmer John has carefully arranged N (1 <= N <=
+10,000) piles of hay bales, each of the same height.  When he isn't
+looking, however, the cows move some of the hay bales between piles, so
+their heights are no longer necessarily the same.  Given the new heights of
+all the piles, please help Farmer John determine the minimum number of hay
+bales he needs to move in order to restore all the piles to their original,
+equal heights.
+PROBLEM NAME: haybales
+INPUT FORMAT:
+* Line 1: The number of piles, N (1 <= N <= 10,000).
+* Lines 2..1+N: Each line contains the number of hay bales in a single
+        pile (an integer in the range 1...10,000).
+SAMPLE INPUT:
+4
+2
+10
+7
+1
+INPUT DETAILS:
+There are 4 piles, of heights 2, 10, 7, and 1.
+OUTPUT FORMAT:
+* Line 1: An integer giving the minimum number of hay bales that need
+        to be moved to restore the piles to having equal heights.
+SAMPLE OUTPUT:
+7
+OUTPUT DETAILS:
+By moving 7 hay bales (3 from pile 2 to pile 1, 2 from pile 2 to pile 4, 2
+from pile 3 to pile 4), we can make all piles have height 5.
+"""
+# Generate solution
+outputs = llm.generate([prompt], sampling_params)
+solution = outputs[0].outputs[0].text
+print(solution)
+# Alternative: OpenAI-compatible API server
+# Start vLLM server first:
+# python -m vllm.entrypoints.openai.api_server --model aws-prototyping/codefu-7b-v0.1 --port 8000
+from openai import OpenAI
+client = OpenAI(
+    api_key="EMPTY",
+    base_url="http://localhost:8000/v1"
+)
+response = client.completions.create(
+    model="aws-prototyping/codefu-7b-v0.1",
+    prompt=prompt,
+    temperature=0.8,
+    top_p=0.95,
+    max_tokens=28672,
+)
+solution = response.choices[0].text
+print(solution)
+```
+We can examine CodeFu's generated [solution](example_response_hay_bales.txt) for this problem, which has been verified as correct.
+## Prompt Format
+CodeFu works best with structured prompts that request both reasoning and code:
+```
+[Role] Please solve this programming problem in C++. Show your thinking process in <think> tags and provide your solution in <code> tags.
+[Problem Description]
+```
+Replace `[Role]` with phrases like:
+- "As a competitive programming expert"
+- "Working as an experienced competitive programmer"
+- "As a master of algorithms and data structures"
+## CodeFu Training Pipeline
+![train_arch](codefu_arch_01_small.png)
+*Figure 1 - CodeFu training pipeline*
+We trained CodeFu using the [veRL](https://github.com/volcengine/verl) library. Ray orchestrates the distributed execution and synchronization of vLLM rollout, reward evaluation (code compilation and execution), FSDP model parallelism, and Ulysses sequence parallelism. We set the degree of sequence parallelism to 4 for long-form reasoning and code generations.
+We extended the [TinyZero](https://github.com/Jiayi-Pan/TinyZero) code repository by using Ray to manage and distribute reward function calculation. This enables parallel C++ code compilation and evaluation across the same cluster to address the compute-intensive and latency-bound nature of code execution. The entire pipeline is executed as a SageMaker training job running on `ml.p4de.24xlarge` instances. The training pipeline consists of the following steps as shown in Figure 1:
+1. **Rollout**: Coding problem prompts are fed into the vLLM inference engine for rolling out potential solutions
+2. **Response Generation**: vLLM generates multiple responses (reasoning + code) for each prompt
+3. **Code Execution**: Code solutions are extracted from responses, and are compiled and executed by distributed workers (compilers and runtime) managed by Ray
+4. **Reward Calculation**: Execution outcomes are used to calculate rewards (i.e. testcase pass ratios) and advantages are computed using group-relative baselines
+5. **Policy Update**: The **Actor** uses advantages and token probabilities to compute the PPO loss, which is then used to update CodeFu's parameters through gradient descent
+6. **Iteration**: The process repeats with batches of prompt-response-reward cycles, with Ray managing the distributed sampling, execution, and training synchronization across the pipeline
+## CodeFu Training Approach
+CodeFu employs a 2-stage curriculum learning approach:
+| Stage | Data | Max resp token | Batch sz | Mini batch sz | # of Rollouts | Reward | Focus | # of nodes |
+|-------|------|-------|------------|--------------|---------|--------|-------|------------|
+| **1** | *easy* problems | 28K | 8 | 8 | 5 | Exp smooth w. public scores | Basic algorithmic reasoning | 1 |
+| **2** | *hard* problems | 20K | 256 | 32 | 8 | Linear w/o public scores | Quality and Robustness | 4 |
+### Reward ###
+**Stages 1 (Exponential smooth with public scores)** - The reward system evaluates solutions using both public and private test cases, with penalties for non-executable code (-1), compilation failures (0), or  EXCEEDS_TIME_LIMIT (0) during execution. For successful runs, the reward equals the test case pass ratio raised to the power of 1.5. This exponential smoothing amplifies differences between partially and highly correct solutions, making it particularly effective for easier problems where it provides clearer learning signals and encourages focus on correctness and completeness.
+**Stages 2 (Linear without public scores)** - The system shifts to linear rewards based solely on private test case pass ratio (without raising to the power of 1.5). Since hard problems are much harder and high pass ratios are difficult to achieve, the linear structure ensures that incremental progress is proportionally rewarded. This approach removes public test case feedback and encourages robust problem-solving strategies that generalize better to unseen scenarios.
+### Data Selection ###
+Training data is sourced from CodeForces problems within the [DeepMind CodeContest](https://huggingface.co/datasets/deepmind/code_contests) dataset, chosen for their reliable CF rating system. Easy problems (~1000 samples, CF rating 800-1000) are used in Stage 1 for basic algorithmic reasoning, while relatively Hard problems (~4000 samples, CF rating 1100-2200) are used in Stages 2 for intermediate to advanced challenges. Both the *Easy* and *Hard* datasets were trained for approximately 2 epochs.
+### Training Stability ###
+We encountered a response length collapse issue when training CodeFu on the *hard problems* dataset - the mean response length would plunge significantly after some period of learning, causing a catastrophic drop in training rewards as shown in Figure 2.
+![Mean Response Length](resp_len_bsz_8_issue.png)
+*Figure 2 - Mean response length and reward collapsed with a small batch size*
+Despite attempts to mitigate this through increased batch sizes (up to 416) as shown in Figure 3, increased rollout samples (up to 32), plain PPO implementation with a co-trained critic model, and KL coefficient adjustments, the collapse persisted or reappeared after periods of stability.
+![Mean Reward](resp_len_bsz_416_issue.png)
+*Figure 3 - Mean response length and reward plummet despite the much larger batch size*
+We eventually resolved this issue by using a true off-policy PPO learning configuration. This is achieved by reducing the mini-batch size to at least 4× smaller than the global batch size, resulting in multiple clipped "mini-" updates as per the [original PPO paper](https://arxiv.org/abs/1707.06347). This approach has since stabilized response length and prevented reward collapse (Figure 4), allowing us to pass multiple epochs on the dataset.
+![Mean Reward](resp_len_bsz_256_fix.png)
+*Figure 4 - Mean response length and reward do not collapse during true off-policy learning*
+We are preparing a paper to provide details on solving the training stability issue, along with an overview of related interesting papers (such as [DAPO](https://arxiv.org/abs/2503.14476), [OPO](https://arxiv.org/abs/2505.23585), [Dr.GRPO](https://arxiv.org/pdf/2503.20783), [GSPO](https://arxiv.org/abs/2507.18071), etc.) on policy stability, efficiency, and optimization for training reasoning models and agents.
+## Citation
+CodeFu is developed by the **AWS WWSO Prototyping** Team. If you find CodeFu helpful, feel free to give us a cite.
+```bibtex
+@misc{codefu2025,
+  title={aws-prototyping/codefu-7b-v0.1},
+  author={Wu, Chen and Song, Yin and Passenger, Josh},
+  year={2025},
+  publisher={Hugging Face},
+  url={https://huggingface.co/aws-prototyping/codefu-7b-v0.1},
+  version={0.1}
+}

codefu-7b-v0.1_usaco.csv.tgz ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ebba00731e25d890692da07bada57a28adc9649124d24c7d34a1f4822fc61cb7
+size 7167241

codefu_arch_01_small.png ADDED Viewed

Git LFS Details

SHA256: 4664f2cfc56bbb3e880d3dccdfe98882fb8d456f2eba4262d2516913bea959bc
Pointer size: 131 Bytes
Size of remote file: 167 kB

config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "_name_or_path": "aws-prototyping/codefu-7b-v0.1",
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 151646,
+  "eos_token_id": 151643,
+  "hidden_act": "silu",
+  "hidden_size": 3584,
+  "initializer_range": 0.02,
+  "intermediate_size": 18944,
+  "max_position_embeddings": 131072,
+  "max_window_layers": 28,
+  "model_type": "qwen2",
+  "num_attention_heads": 28,
+  "num_hidden_layers": 28,
+  "num_key_value_heads": 4,
+  "pad_token_id": 151643,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 10000,
+  "sliding_window": null,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.47.1",
+  "use_cache": true,
+  "use_mrope": false,
+  "use_sliding_window": false,
+  "vocab_size": 152064
+}

example_response_hay_bales.txt ADDED Viewed

	@@ -0,0 +1,188 @@

+Okay, I need to solve this problem where Farmer John's cows messed up the hay bale piles, and I have to find the minimum number of hay bales he needs to move to get them all back to equal heights. Hmm.
+So, let's think about it. The initial state had all piles of the same height. Then cows moved some, so now each pile can be different. But Farmer John needs to bring them all back to the same height. Wait, but what was the original height? Oh, right, the problem statement says that the initial heights were all the same. So, I think I need to find what the target height is, and then compute how many moves are needed to reach that.
+Wait, but wait a minute. Wait, in the problem statement, the original arrangement has all piles of the same height. So, the problem is, after some changes, the heights are now given. We have to find the target height (which is the same for all piles) that requires the minimum number of hay bales to be moved.
+Alternatively, no—wait, perhaps the initial height was not known. Because in the problem statement, it says that the initial heights were all equal, but after the cows messed up, they have varying heights. So, perhaps the target is to bring them back to the same height, but what was that original height?
+Wait, perhaps the original height can be calculated. Because the sum of all the hay bales is the same in the initial and final state. Because moving hay bales between piles doesn't change the total. So, the total sum is fixed. Therefore, the target height must be the total sum divided by the number of piles, right?
+Yes, that makes sense. Because initially, all piles had the same height, say H. Then sum(H * N) is the total. After cows move, the sum remains the same. So, to get back to all piles equal, each pile must be of height total / N.
+But wait, that assumes that total is divisible by N. But in the problem statement, it's given that N can be up to 10,000, and the heights are up to 10,000. So, when I compute the sum, I have to make sure that it's divisible by N. Otherwise, it's impossible. But according to the problem statement, the cows just move hay bales, so the sum remains the same. Therefore, the target height must be sum / N.
+So, the steps are:
+1. Calculate the total sum of all the piles.
+2. The target height is sum / N.
+3. For each pile, calculate the difference between the current height and the target. If it's above, they need to give away (current - target) bales. If it's below, they need to receive (target - current) bales.
+4. But since we can only move hay bales from those above to those below, the total number of moves required is the sum of the excess from each pile, which equals the sum of the deficits (since the total sum is the same before and after).
+So, to find the minimum number of moves, we need to sum all the positive differences between each pile's current height and the target. That's the total number of hay bales that need to be moved.
+Wait, but that's the number of moves? Because each hay bale that needs to be moved is a transfer from one pile to another. So, the total number of moves is the sum of the excess in each pile.
+So, the algorithm is straightforward:
+- Compute sum of all piles.
+- Compute target = sum / N.
+- For each pile, compute (current height - target). If it's positive, add it to the total moves. If it's negative, ignore (since those are the deficits that others will cover).
+- The total moves is the sum of all the positive differences.
+Let me test this with the sample input.
+Sample input:
+4
+2
+10
+7
+1
+Sum is 2 +10+7+1=20. Target is 20/4=5.
+Compute the differences:
+2: 2-5= -3 → no contribution
+10: 10-5=5 → add 5
+7: 7-5=2 → add 2
+1: 1-5=-4 → no contribution
+Total moves: 5+2=7 → which matches the sample output.
+Yes, that works.
+Another test case: suppose N=2, with piles 3 and 5. Sum is 8, target is 4. The excess is 1 (5-4=1) and deficit is 1 (3-4= -1). So total moves is 1.
+Another case: N=3, with piles 4,4,4. Sum is 12, target is 4. So all are at target. Total moves is 0.
+Another case: N=1. Then only one pile, target is same as current. So moves are 0.
+So, the algorithm seems solid.
+So, now, the plan is to read N, read all the heights, sum them, compute target, and then sum the positive differences.
+Now, coding this in C++. The steps:
+1. Read N from input.
+2. Read N integers, store in a vector or array.
+3. Compute the sum of these integers.
+4. Compute target = sum / N.
+5. For each element in the array, compute (current - target). If it's positive, add to total.
+6. Output the total.
+Potential edge cases:
+- N=1: handle correctly.
+- All piles already at target: output 0.
+- Piles with varying heights but sum exactly divisible by N.
+Now, implement this in C++. So, let's think about the code.
+Read N: easy, use cin >> N.
+Read N integers: store in a vector or array. Let's use a vector for simplicity.
+Compute sum: loop through the vector, add each element.
+Compute target: sum / N.
+Loop through each element in the vector again, compute (current - target). If it's positive, add to total.
+But wait, wait a moment: if sum is not exactly divisible by N, then target is a fractional value, but in the problem statement, the initial heights were all equal, so target must be integer. So, the sum must be divisible by N. So, in code, we can safely cast sum to int and divide by N.
+But in the problem statement, the input is such that it's always possible. So, the code doesn't need to handle cases where sum is not divisible by N.
+So, in C++, code outline:
+#include <iostream>
+#include <vector>
+using namespace std;
+int main() {
+    int N;
+    cin >> N;
+    vector<int> piles;
+    int sum = 0;
+    for(int i=0; i<N; i++) {
+        int h;
+        cin >> h;
+        piles.push_back(h);
+        sum += h;
+    }
+    int target = sum / N;
+    int moves = 0;
+    for(int h : piles) {
+        if (h > target) {
+            moves += (h - target);
+        }
+    }
+    cout << moves << endl;
+}
+Yes, that should work.
+Testing this with the sample input:
+N=4, piles are 2,10,7,1.
+Sum is 20, target is 5.
+Each h > target: 10 (5), 7 (2). Sum is 7.
+Which is correct.
+Another test case: N=3, piles 6,6,6. Sum is 18, target 6. So moves 0.
+Another test: N=2, piles 0 and 4. Wait, but the input constraints say each h is >=1. Oh, in the input, the h can be 1, but according to the problem statement, perhaps the initial heights can be anything, but the problem says that the new heights are given as per the input. Wait, the problem statement says each pile's new height is in 1...10000. So, that's correct.
+Wait, no, the initial arrangement was that all piles were same height, but the input gives the new heights after the cows messed it up. So, the new heights can be any numbers, as per the problem statement.
+So, the code will handle it.
+So, the code should be correct.
+</think>
+To solve this problem, we need to determine the minimum number of hay bales that need to be moved to restore all piles to their original equal height. The key insight is that the total number of hay bales remains constant, so the target height for each pile is the total number of hay bales divided by the number of piles.
+### Approach
+1. **Read Input**: Read the number of piles and their respective heights.
+2. **Calculate Total Hay Bales**: Compute the sum of all the hay bales in the piles.
+3. **Determine Target Height**: The target height for each pile is the total number of hay bales divided by the number of piles.
+4. **Compute Moves**: For each pile, calculate the difference between its current height and the target height. Sum these differences for all piles where the current height exceeds the target height. This sum gives the minimum number of hay bales that need to be moved.
+### Solution Code
+```cpp
+#include <iostream>
+#include <vector>
+using namespace std;
+int main() {
+    int N;
+    cin >> N;
+    vector<int> piles;
+    int sum = 0;
+    for(int i = 0; i < N; i++) {
+        int h;
+        cin >> h;
+        piles.push_back(h);
+        sum += h;
+    }
+    int target = sum / N;
+    int moves = 0;
+    for(int h : piles) {
+        if (h > target) {
+            moves += (h - target);
+        }
+    }
+    cout << moves << endl;
+}
+```
+### Explanation
+- **Reading Input**: The code reads the number of piles and their respective heights.
+- **Sum Calculation**: It calculates the total number of hay bales across all piles.
+- **Target Height**: The target height is computed as the total hay bales divided by the number of piles.
+- **Move Calculation**: For each pile, if its height is greater than the target height, the excess hay bales are counted. The sum of these excess values gives the minimum number of hay bales that need to be moved.
+This approach efficiently computes the required result with a time complexity of O(N), where N is the number of piles, making it suitable for the given problem constraints.

generation_config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 151646,
+  "do_sample": true,
+  "eos_token_id": 151643,
+  "temperature": 0.6,
+  "top_p": 0.95,
+  "transformers_version": "4.47.1"
+}

model-00001-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:af5b9bdd0f5484fc253a25bb8b2915ab1291284d0480871953269e59fec85527
+size 4877660776

model-00002-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fc771140a93f7d9bde36f01514d9a927d5da6deae9b80f7235eef9982647bfd3
+size 4932751008

model-00003-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a86eb98d2bca160b1a4da4923f92be5bf99c6dff6aaabd1bb4021431f968c40a
+size 4330865200

model-00004-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b6666eb2fcc13e0df1b0ebf6310cf2d09587e73e6026bc799189faf0c33b02c4
+size 1089994880

model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,346 @@

+{
+  "metadata": {
+    "total_size": 15231233024
+  },
+  "weight_map": {
+    "lm_head.weight": "model-00004-of-00004.safetensors",
+    "model.embed_tokens.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.18.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.18.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.19.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.19.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.19.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.8.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.8.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.8.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.8.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.8.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.norm.weight": "model-00003-of-00004.safetensors"
+  }
+}

resp_len_bsz_256_fix.png ADDED Viewed

Git LFS Details

SHA256: b15ed38f7f5bf4486da5f0aa7d6bfbef322c1b66c5ab741e69190d6ba17bb3bc
Pointer size: 130 Bytes
Size of remote file: 42.5 kB

resp_len_bsz_416_issue.png ADDED Viewed

Git LFS Details

SHA256: 3ec1a739c01ee1e29eb7b8456d0f3f3ac93d2818b77540164ceb623f328b75ff
Pointer size: 130 Bytes
Size of remote file: 37.6 kB

resp_len_bsz_8_issue.png ADDED Viewed

Git LFS Details

SHA256: bba75144d6f9fa8d9931b08e16885637e0f7e93a1457e978242e620c1e1fd472
Pointer size: 130 Bytes
Size of remote file: 45.2 kB

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,23 @@

+{
+  "bos_token": {
+    "content": "<｜begin▁of▁sentence｜>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<｜end▁of▁sentence｜>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<｜end▁of▁sentence｜>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,195 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<｜end▁of▁sentence｜>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<｜User｜>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151645": {
+      "content": "<｜Assistant｜>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151646": {
+      "content": "<｜begin▁of▁sentence｜>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|EOT|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151648": {
+      "content": "<think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151649": {
+      "content": "</think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "bos_token": "<｜begin▁of▁sentence｜>",
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}{%- for message in messages %}{%- if message['role'] == 'system' %}{% set ns.system_prompt = message['content'] %}{%- endif %}{%- endfor %}{{bos_token}}{{ns.system_prompt}}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<｜User｜>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is none %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls']%}{%- if not ns.is_first %}{{'<｜Assistant｜><｜tool▁calls▁begin｜><｜tool▁call▁begin��>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<｜tool▁call▁end｜>'}}{%- set ns.is_first = true -%}{%- else %}{{'\\n' + '<｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<｜tool▁call▁end｜>'}}{{'<｜tool▁calls▁end｜><｜end▁of▁sentence｜>'}}{%- endif %}{%- endfor %}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is not none %}{%- if ns.is_tool %}{{'<｜tool▁outputs▁end｜>' + message['content'] + '<｜end▁of▁sentence｜>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{{'<｜Assistant｜>' + content + '<｜end▁of▁sentence｜>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\\n<｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<｜tool▁outputs▁end｜>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<｜Assistant｜><think>\\n'}}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<｜end▁of▁sentence｜>",
+  "extra_special_tokens": {},
+  "legacy": true,
+  "model_max_length": 16384,
+  "pad_token": "<｜end▁of▁sentence｜>",
+  "sp_model_kwargs": {},
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": null,
+  "use_default_system_prompt": false
+}