File size: 12,077 Bytes
c1b5f9b
9f302e3
 
 
c1b5f9b
 
 
 
 
 
9f302e3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c1b5f9b
5db7cf8
a4b5e2b
5db7cf8
429b184
5db7cf8
 
 
c269b6c
 
5db7cf8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c1b5f9b
 
 
 
61fa8a5
c1b5f9b
 
 
 
9f302e3
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
---
language:
- en
license: llama3.2
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
base_model: EpistemeAI/ReasoningCore-3B-0
model-index:
- name: ReasoningCore-3B-RE1-V2
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: IFEval (0-Shot)
      type: HuggingFaceH4/ifeval
      args:
        num_few_shot: 0
    metrics:
    - type: inst_level_strict_acc and prompt_level_strict_acc
      value: 73.93
      name: strict accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=EpistemeAI/ReasoningCore-3B-RE1-V2
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: BBH (3-Shot)
      type: BBH
      args:
        num_few_shot: 3
    metrics:
    - type: acc_norm
      value: 22.47
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=EpistemeAI/ReasoningCore-3B-RE1-V2
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MATH Lvl 5 (4-Shot)
      type: hendrycks/competition_math
      args:
        num_few_shot: 4
    metrics:
    - type: exact_match
      value: 15.63
      name: exact match
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=EpistemeAI/ReasoningCore-3B-RE1-V2
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GPQA (0-shot)
      type: Idavidrein/gpqa
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 3.13
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=EpistemeAI/ReasoningCore-3B-RE1-V2
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MuSR (0-shot)
      type: TAUR-Lab/MuSR
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 2.02
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=EpistemeAI/ReasoningCore-3B-RE1-V2
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU-PRO (5-shot)
      type: TIGER-Lab/MMLU-Pro
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 24.23
      name: accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=EpistemeAI/ReasoningCore-3B-RE1-V2
      name: Open LLM Leaderboard
---
Note: This is an experimental model.    
For alignment and safety, please use [ReasoningCore-Llama-3B-R1-aligned](https://huggingface.co/EpistemeAI/ReasoningCore-Llama-3B-R1-aligned)

# ReasoningCore‑3B-RE1-V2

**ReasoningCore‑3B** is a multilingual, reasoning‑enhanced large language model developed by EpitemeAI. Pretrained on vast amounts of publicly available data and instruction‑tuned to excel at nuanced reasoning, dialogue management, retrieval, and summarization tasks, it often outperforms many current open source and proprietary conversational models on a range of industry benchmarks. Fine tuned with reasoning dataset.

### We used GRPO technique: 

To provide a comprehensive overview of Group Relative Policy Optimization (GRPO), a post-training technique for Large Language Models (LLMs), and its application in the DeepSeek-R1 model.
- Post-training with GRPO involves using a reinforcement learning (RL) technique to optimize the Large Language Model (LLM) after it has been initially trained.
- GRPO specifically focuses on scaling test-time compute for extended reasoning tasks, making it suitable for tackling complex problems like mathematical problem-solving.
- Unlike earlier methods that utilized search-heuristic approaches, GRPO relies exclusively on RL for post-training, thereby enhancing the model's ability to handle nuanced tasks.
- The GRPO technique is available through the TRL library, and the Hugging Face Science team is working to reproduce the full DeepSeek-R1 training process, which can be explored in their


---

## Model Information

- **Model Developer:** EpitemeAI
- **Model Architecture:**  
  ReasoningCore‑3B is an auto‑regressive language model built on an optimized transformer architecture. It incorporates specialized reasoning pathways and has been fine‑tuned using Group Robust Preference Optimization(GRPO), and both supervised learning and reinforcement learning with human feedback (RLHF) to align with human expectations for clarity, accuracy, and safety in complex tasks.

|                                | Training Data                                    | Params | Input Modalities      | Output Modalities            | Context Length | GQA | Shared Embeddings | Token Count    | Knowledge Cutoff  |
|--------------------------------|--------------------------------------------------|--------|-----------------------|------------------------------|----------------|-----|-------------------|----------------|-------------------|
| **ReasoningCore‑3B (text only)** | A new mix of publicly available online data.     | 3B     | Multilingual Text     | Multilingual Text and code   | 128k           | Yes | Yes               | Up to 9T tokens | December 2023     |

- **Supported Languages:**  
  Officially supports English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. While the pretraining included a broader range of languages, additional languages can be fine‑tuned in compliance with the community license and acceptable use policies.
- **Model Release Date:** Sept 25, 2024  
- **Status:** Static model trained on an offline dataset. Future iterations may further enhance its reasoning capabilities and safety features.
- **License:** Use is governed by the [Llama 3.2 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/LICENSE) (a custom, commercial license agreement).
- **Feedback:** For questions or comments, please refer to the [GitHub repository README](https://github.com/meta-llama/llama-models/tree/main/models/llama3_2) or follow the linked instructions.

---

## Intended Use

### Use Cases
- **Conversational AI:** Assistant‑like interactions.
- **Knowledge Retrieval & Summarization:** Dynamic extraction and condensation of information.
- **Mobile AI‑Powered Writing Assistants:** Query reformulation and natural language generation.
- **General Natural Language Generation:** Any application that benefits from advanced reasoning abilities.

### Out of Scope
- Deployments that violate applicable laws or trade compliance regulations.
- Use cases that conflict with the Acceptable Use Policy or licensing terms.
- Deployments in languages not explicitly supported (unless additional safety and performance validations are performed).

---

## How to Use

ReasoningCore‑3B can be integrated using popular machine learning frameworks. Two primary methods are provided:

## Use system prompt
```bash
SYSTEM_PROMPT = """
Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
...
</answer>
"""
```

### Use with Transformers

Ensure you have transformers version 4.43.0 or later installed:

```bash
pip install --upgrade transformers

import torch
from transformers import pipeline

model_id = "EpistemeAI/ReasoningCore-3B-R01"
pipe = pipeline(
    "text-generation", 
    model=model_id, 
    torch_dtype=torch.bfloat16, 
    device_map="auto"
)
print(pipe("The secret to effective reasoning is"))
```
## For Mathematical problems
Please use "Please reason step by step, and put your final answer within \boxed{}" in system prompt


## Responsibility & Safety

### Responsible Deployment

#### Approach:
- **ReasoningCore‑3B** is a foundational technology that includes built‑in safety guardrails. Developers are encouraged to integrate additional safeguards tailored to their specific applications.

#### System‑Level Safety:
- The model is designed to be deployed as part of a broader system that implements safety measures (e.g., Prompt Guard, Code Shield) to ensure outputs remain safe even under adversarial conditions.

---

### Safety Fine‑Tuning & Data Strategy

#### Objectives:
- Provide a reliable tool for building secure and helpful reasoning systems.
- Mitigate adversarial misuse through advanced data selection and response optimization techniques.

#### Methodology:
- Incorporate adversarial prompts during training to refine model refusals and response tone.
- Combine human‑curated data with synthetic data.
- Utilize iterative fine‑tuning using supervised learning, rejection sampling, and preference optimization.

---

### Evaluations and Red Teaming

#### Scaled Evaluations:
- Dedicated adversarial datasets were used to rigorously test the model’s robustness. Developers should perform context‑specific evaluations.

#### Red Teaming:
- Experts in cybersecurity, adversarial machine learning, and responsible AI conducted recurring red team exercises to identify vulnerabilities and improve both performance and safety.

---

### Critical Risk Mitigations

- **CBRNE:**  
  The model has been evaluated to ensure it does not enhance capabilities for harmful activities involving chemical, biological, radiological, nuclear, or explosive materials.

- **Child Safety:**  
  Expert assessments were conducted to evaluate and mitigate potential child safety risks.

- **Cyber Attacks:**  
  Measures were taken to ensure the model cannot autonomously facilitate cyber‑offensive operations.

---

### Ethical Considerations and Limitations

#### Core Values:
- **ReasoningCore‑3B** is built on the values of openness, inclusivity, and helpfulness. It is designed to respect user autonomy and foster free thought and expression while mitigating potential harm.

#### Testing and Limitations:
- Despite extensive testing across diverse scenarios, the model may occasionally produce inaccurate, biased, or objectionable outputs. Developers must perform additional safety testing and integrate further safeguards as needed.

#### Resources for Safe Deployment, with Meta Safety Deployment:
- [Responsible Use Guide](https://llama.meta.com/responsible-use-guide)
- [Trust and Safety Resources](https://llama.meta.com/trust-and-safety)
- [Getting Started Guide](https://llama.meta.com/docs/get-started)

---

### Conclusion

**ReasoningCore‑3B** represents a significant advancement in multilingual, reasoning‑enhanced language models. Optimized for tasks requiring deep reasoning, contextual understanding, and safe, helpful interactions, it offers a powerful tool for both commercial and research applications. We invite developers and researchers to explore its capabilities and contribute to building secure, innovative AI systems.

For further details, questions, or feedback, please email [email protected]

# Uploaded  model

- **Developed by:** EpistemeAI
- **License:** llama 3.2 communmity license
- **Finetuned from model :** EpistemeAI/ReasoningCore-3B-0

This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/EpistemeAI__ReasoningCore-3B-RE1-V2-details)

|      Metric       |Value|
|-------------------|----:|
|Avg.               |23.57|
|IFEval (0-Shot)    |73.93|
|BBH (3-Shot)       |22.47|
|MATH Lvl 5 (4-Shot)|15.63|
|GPQA (0-shot)      | 3.13|
|MuSR (0-shot)      | 2.02|
|MMLU-PRO (5-shot)  |24.23|