bradhiltonendercorp lbourdois commited on
Commit
e18c8e1
·
verified ·
1 Parent(s): e4e9e77

Improve language tag (#4)

Browse files

- Improve language tag (4f26782a6700f86b71db029f2ac89b83f47cbe37)


Co-authored-by: Loïck BOURDOIS <[email protected]>

Files changed (1) hide show
  1. README.md +39 -27
README.md CHANGED
@@ -1,27 +1,39 @@
1
- ---
2
- license: mit
3
- license_link: https://huggingface.co/OpenPipe/Deductive-Reasoning-Qwen-14B/blob/main/LICENSE
4
- language:
5
- - en
6
- pipeline_tag: text-generation
7
- base_model:
8
- - Qwen/Qwen2.5-14B-Instruct
9
- tags:
10
- - chat
11
- library_name: transformers
12
- ---
13
-
14
- # Deductive-Reasoning-Qwen-14B
15
-
16
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/674a1d102c0f27a385772cfe/JauBmEQM0FpOdShBMSfst.png)
17
-
18
- Deductive Reasoning Qwen 14B is a reinforcement fine-tune of [Qwen 2.5 14B Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) to solve challenging deduction problems from the [Temporal Clue](https://github.com/bradhilton/temporal-clue) dataset, trained by [OpenPipe](https://openpipe.ai)!
19
-
20
- Here are some additional resources to check out:
21
-
22
- - [Blog Post](https://openpipe.ai/blog/using-grpo-to-beat-o1-o3-mini-and-r1-on-temporal-clue)
23
- - [Training Recipe](https://github.com/openpipe/deductive-reasoning)
24
- - [RL Experiments](https://github.com/openpipe/rl-experiments)
25
- - [Deductive Reasoning Qwen 32B](https://huggingface.co/OpenPipe/Deductive-Reasoning-Qwen-32B)
26
-
27
- If you're interested in training your own models with reinforcement learning or just chatting, feel free to [reach out](https://openpipe.ai/contact) or email Kyle directly at [email protected]!
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ license_link: https://huggingface.co/OpenPipe/Deductive-Reasoning-Qwen-14B/blob/main/LICENSE
4
+ language:
5
+ - zho
6
+ - eng
7
+ - fra
8
+ - spa
9
+ - por
10
+ - deu
11
+ - ita
12
+ - rus
13
+ - jpn
14
+ - kor
15
+ - vie
16
+ - tha
17
+ - ara
18
+ pipeline_tag: text-generation
19
+ base_model:
20
+ - Qwen/Qwen2.5-14B-Instruct
21
+ tags:
22
+ - chat
23
+ library_name: transformers
24
+ ---
25
+
26
+ # Deductive-Reasoning-Qwen-14B
27
+
28
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/674a1d102c0f27a385772cfe/JauBmEQM0FpOdShBMSfst.png)
29
+
30
+ Deductive Reasoning Qwen 14B is a reinforcement fine-tune of [Qwen 2.5 14B Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) to solve challenging deduction problems from the [Temporal Clue](https://github.com/bradhilton/temporal-clue) dataset, trained by [OpenPipe](https://openpipe.ai)!
31
+
32
+ Here are some additional resources to check out:
33
+
34
+ - [Blog Post](https://openpipe.ai/blog/using-grpo-to-beat-o1-o3-mini-and-r1-on-temporal-clue)
35
+ - [Training Recipe](https://github.com/openpipe/deductive-reasoning)
36
+ - [RL Experiments](https://github.com/openpipe/rl-experiments)
37
+ - [Deductive Reasoning Qwen 32B](https://huggingface.co/OpenPipe/Deductive-Reasoning-Qwen-32B)
38
+
39
+ If you're interested in training your own models with reinforcement learning or just chatting, feel free to [reach out](https://openpipe.ai/contact) or email Kyle directly at [email protected]!