Chan-Y commited on
Commit
b5589da
·
verified ·
1 Parent(s): 2e552a7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -28
README.md CHANGED
@@ -1,36 +1,63 @@
1
  ---
2
  base_model: ibm-granite/granite-3.1-2b-instruct
3
  library_name: transformers
4
- model_name: zweig_granite_model_2501
5
  tags:
6
  - generated_from_trainer
7
  - trl
8
  - sft
9
- licence: license
 
10
  ---
11
 
12
- # Model Card for zweig_granite_model_2501
13
 
14
  This model is a fine-tuned version of [ibm-granite/granite-3.1-2b-instruct](https://huggingface.co/ibm-granite/granite-3.1-2b-instruct).
15
  It has been trained using [TRL](https://github.com/huggingface/trl).
16
 
 
 
 
 
17
  ## Quick start
18
 
19
  ```python
20
- from transformers import pipeline
 
 
 
 
 
 
 
21
 
22
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
- generator = pipeline("text-generation", model="Chan-Y/zweig_granite_model_2501", device="cuda")
24
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
- print(output["generated_text"])
 
 
 
 
 
 
 
 
 
26
  ```
27
 
28
  ## Training procedure
29
 
30
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/cihan/huggingface/runs/301tbirg)
 
 
 
 
31
 
 
32
 
33
- This model was trained with SFT.
 
34
 
35
  ### Framework versions
36
 
@@ -38,21 +65,4 @@ This model was trained with SFT.
38
  - Transformers: 4.48.1
39
  - Pytorch: 2.5.1+cu124
40
  - Datasets: 3.2.0
41
- - Tokenizers: 0.21.0
42
-
43
- ## Citations
44
-
45
-
46
-
47
- Cite TRL as:
48
-
49
- ```bibtex
50
- @misc{vonwerra2022trl,
51
- title = {{TRL: Transformer Reinforcement Learning}},
52
- author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
53
- year = 2020,
54
- journal = {GitHub repository},
55
- publisher = {GitHub},
56
- howpublished = {\url{https://github.com/huggingface/trl}}
57
- }
58
- ```
 
1
  ---
2
  base_model: ibm-granite/granite-3.1-2b-instruct
3
  library_name: transformers
4
+ model_name: Stefan-Zweig-Granite-2B
5
  tags:
6
  - generated_from_trainer
7
  - trl
8
  - sft
9
+ datasets:
10
+ - Chan-Y/Stefan-Zweig-Chat
11
  ---
12
 
13
+ # Model Card for Stefan Zweig Language Model
14
 
15
  This model is a fine-tuned version of [ibm-granite/granite-3.1-2b-instruct](https://huggingface.co/ibm-granite/granite-3.1-2b-instruct).
16
  It has been trained using [TRL](https://github.com/huggingface/trl).
17
 
18
+ ## Model Details
19
+ This model is designed to emulate Stefan Zweig's distinctive writing and conversational style in chat format.
20
+ Used a fine-tuning approach based on the methodology described in the DeepSeek-V3 technical report. The project aims to create a language model that emulates Stefan Zweig's distinctive writing and conversational style using a two-stage training process: Supervised Fine-Tuning (SFT) followed by Group Relative Policy Optimization (GRPO).
21
+
22
  ## Quick start
23
 
24
  ```python
25
+ from transformers import AutoModelForCausalLM, AutoTokenizer
26
+ import torch
27
+ device = "cuda" if torch.cuda.is_available() else "cpu"
28
+ model = AutoModelForCausalLM.from_pretrained("Chan-Y/Stefan-Zweig-Granite", device_map=device)
29
+ tokenizer = AutoTokenizer.from_pretrained("Chan-Y/Stefan-Zweig-Granite")
30
+
31
+ input_text = "As an experienced and famous writer Stefan Zweig, what's your opinion on artificial intelligence?"
32
+ inputs = tokenizer(input_text, return_tensors="pt").to(device)
33
 
34
+ with torch.no_grad():
35
+ outputs = model.generate(
36
+ **inputs,
37
+ max_length=512,
38
+ num_return_sequences=1,
39
+ do_sample=True,
40
+ temperature=0.7,
41
+ top_p=0.9,
42
+ )
43
+
44
+ # Decode the generated text
45
+ generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
46
+ print(generated_text.split(input_text)[-1])
47
  ```
48
 
49
  ## Training procedure
50
 
51
+ ![train-loss](train-loss.png)
52
+
53
+ **Dataset:** Custom synthetic dataset generated using argilla/synthetic-data-generator with Qwen2.5:14b
54
+ **Data Format:** Structured conversations with specific role markers and custom tokens
55
+ **Data Processing:** Implementation of special tokens <stefan_zweig> and </stefan_zweig> for style consistency
56
 
57
+ - **Training Type:** Two-stage training pipeline
58
 
59
+ 1. Supervised Fine-Tuning (SFT)
60
+ 2. Group Relative Policy Optimization (GRPO)
61
 
62
  ### Framework versions
63
 
 
65
  - Transformers: 4.48.1
66
  - Pytorch: 2.5.1+cu124
67
  - Datasets: 3.2.0
68
+ - Tokenizers: 0.21.0