TomPei Markhit commited on
Commit
fc93ef9
·
verified ·
1 Parent(s): 954f707

Update README.md (#2)

Browse files

- Update README.md (642d37ee0d2087a6478457d9aac41fc18de4d67d)


Co-authored-by: mark johnson <[email protected]>

Files changed (1) hide show
  1. README.md +100 -3
README.md CHANGED
@@ -1,3 +1,100 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - open-r1/OpenThoughts-114k-Code_decontaminated
5
+ base_model:
6
+ - Qwen/Qwen2.5-Coder-3B-Instruct
7
+ library_name: transformers
8
+ tags:
9
+ - code
10
+ - grpo
11
+ - open-r1
12
+ ---
13
+
14
+ # Model Card for OpenCSG-R1-Qwen2.5-Code-3B-V1
15
+
16
+ This model is a fine-tuned version of [Qwen/Qwen2.5-Coder-3B-Instruct] (https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct) on the [open-r1/OpenThoughts-114k-Code_decontaminated] datasets.
17
+ It has been trained using [TRL](https://github.com/huggingface/trl).
18
+
19
+ ## Quick start
20
+ ```python
21
+ from transformers import AutoTokenizer, AutoModelForCausalLM
22
+ import torch
23
+ import pandas as pd
24
+
25
+ model_name = "/data/project/pj/r1/opencsg-r1/open-r1/train/Qwen2.5-3B-Open-R1-Code-GRPO/checkpoint-150"
26
+ model = AutoModelForCausalLM.from_pretrained(
27
+ model_name,
28
+ torch_dtype="auto",
29
+ device_map="auto"
30
+ )
31
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=False)
32
+ df = pd.read_parquet('/data/project/pj/r1/opencsg-r1/OpenThoughts-114k-Code_decontaminated/train-00000-of-00006.parquet')
33
+ data = df['problem'][0]
34
+ messages = [
35
+ {
36
+ "role": "user",
37
+ "content": f"Please help me solve the problem: {data}.Output the thinking process within the <think> </think> tags,and then return the final result within the <answer> </answer> tags.",
38
+ },
39
+ {
40
+ "role": "assistant",
41
+ "content": "Let's solve the problem step by step.\n<think>",
42
+ },
43
+ ]
44
+ text = tokenizer.apply_chat_template(
45
+ messages,
46
+ tokenize=False,
47
+ continue_final_message=True,
48
+ # add_generation_prompt=True
49
+ )
50
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
51
+
52
+ generated_ids = model.generate(
53
+ **model_inputs,
54
+ max_new_tokens=1024,
55
+ temperature=0.6
56
+ )
57
+ generated_ids = [
58
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
59
+ ]
60
+
61
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
62
+ print(response)
63
+ ```
64
+
65
+ This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
66
+
67
+ ### Framework versions
68
+
69
+ - TRL: 0.15.2
70
+ - Transformers: 4.49.0
71
+ - Pytorch: 2.5.1
72
+ - Datasets: 3.3.2
73
+ - Tokenizers: 0.21.0
74
+
75
+ ## Citations
76
+
77
+ Cite GRPO as:
78
+
79
+ ```bibtex
80
+ @article{zhihong2024deepseekmath,
81
+ title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
82
+ author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
83
+ year = 2024,
84
+ eprint = {arXiv:2402.03300},
85
+ }
86
+
87
+ ```
88
+
89
+ Cite TRL as:
90
+
91
+ ```bibtex
92
+ @misc{vonwerra2022trl,
93
+ title = {{TRL: Transformer Reinforcement Learning}},
94
+ author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
95
+ year = 2020,
96
+ journal = {GitHub repository},
97
+ publisher = {GitHub},
98
+ howpublished = {\url{https://github.com/huggingface/trl}}
99
+ }
100
+ ```