Update README.md
Browse files
README.md
CHANGED
@@ -1,36 +1,63 @@
|
|
1 |
---
|
2 |
base_model: ibm-granite/granite-3.1-2b-instruct
|
3 |
library_name: transformers
|
4 |
-
model_name:
|
5 |
tags:
|
6 |
- generated_from_trainer
|
7 |
- trl
|
8 |
- sft
|
9 |
-
|
|
|
10 |
---
|
11 |
|
12 |
-
# Model Card for
|
13 |
|
14 |
This model is a fine-tuned version of [ibm-granite/granite-3.1-2b-instruct](https://huggingface.co/ibm-granite/granite-3.1-2b-instruct).
|
15 |
It has been trained using [TRL](https://github.com/huggingface/trl).
|
16 |
|
|
|
|
|
|
|
|
|
17 |
## Quick start
|
18 |
|
19 |
```python
|
20 |
-
from transformers import
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
```
|
27 |
|
28 |
## Training procedure
|
29 |
|
30 |
-
[
|
|
|
|
|
|
|
|
|
31 |
|
|
|
32 |
|
33 |
-
|
|
|
34 |
|
35 |
### Framework versions
|
36 |
|
@@ -38,21 +65,4 @@ This model was trained with SFT.
|
|
38 |
- Transformers: 4.48.1
|
39 |
- Pytorch: 2.5.1+cu124
|
40 |
- Datasets: 3.2.0
|
41 |
-
- Tokenizers: 0.21.0
|
42 |
-
|
43 |
-
## Citations
|
44 |
-
|
45 |
-
|
46 |
-
|
47 |
-
Cite TRL as:
|
48 |
-
|
49 |
-
```bibtex
|
50 |
-
@misc{vonwerra2022trl,
|
51 |
-
title = {{TRL: Transformer Reinforcement Learning}},
|
52 |
-
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
|
53 |
-
year = 2020,
|
54 |
-
journal = {GitHub repository},
|
55 |
-
publisher = {GitHub},
|
56 |
-
howpublished = {\url{https://github.com/huggingface/trl}}
|
57 |
-
}
|
58 |
-
```
|
|
|
1 |
---
|
2 |
base_model: ibm-granite/granite-3.1-2b-instruct
|
3 |
library_name: transformers
|
4 |
+
model_name: Stefan-Zweig-Granite-2B
|
5 |
tags:
|
6 |
- generated_from_trainer
|
7 |
- trl
|
8 |
- sft
|
9 |
+
datasets:
|
10 |
+
- Chan-Y/Stefan-Zweig-Chat
|
11 |
---
|
12 |
|
13 |
+
# Model Card for Stefan Zweig Language Model
|
14 |
|
15 |
This model is a fine-tuned version of [ibm-granite/granite-3.1-2b-instruct](https://huggingface.co/ibm-granite/granite-3.1-2b-instruct).
|
16 |
It has been trained using [TRL](https://github.com/huggingface/trl).
|
17 |
|
18 |
+
## Model Details
|
19 |
+
This model is designed to emulate Stefan Zweig's distinctive writing and conversational style in chat format.
|
20 |
+
Used a fine-tuning approach based on the methodology described in the DeepSeek-V3 technical report. The project aims to create a language model that emulates Stefan Zweig's distinctive writing and conversational style using a two-stage training process: Supervised Fine-Tuning (SFT) followed by Group Relative Policy Optimization (GRPO).
|
21 |
+
|
22 |
## Quick start
|
23 |
|
24 |
```python
|
25 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
26 |
+
import torch
|
27 |
+
device = "cuda" if torch.cuda.is_available() else "cpu"
|
28 |
+
model = AutoModelForCausalLM.from_pretrained("Chan-Y/Stefan-Zweig-Granite", device_map=device)
|
29 |
+
tokenizer = AutoTokenizer.from_pretrained("Chan-Y/Stefan-Zweig-Granite")
|
30 |
+
|
31 |
+
input_text = "As an experienced and famous writer Stefan Zweig, what's your opinion on artificial intelligence?"
|
32 |
+
inputs = tokenizer(input_text, return_tensors="pt").to(device)
|
33 |
|
34 |
+
with torch.no_grad():
|
35 |
+
outputs = model.generate(
|
36 |
+
**inputs,
|
37 |
+
max_length=512,
|
38 |
+
num_return_sequences=1,
|
39 |
+
do_sample=True,
|
40 |
+
temperature=0.7,
|
41 |
+
top_p=0.9,
|
42 |
+
)
|
43 |
+
|
44 |
+
# Decode the generated text
|
45 |
+
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
46 |
+
print(generated_text.split(input_text)[-1])
|
47 |
```
|
48 |
|
49 |
## Training procedure
|
50 |
|
51 |
+

|
52 |
+
|
53 |
+
**Dataset:** Custom synthetic dataset generated using argilla/synthetic-data-generator with Qwen2.5:14b
|
54 |
+
**Data Format:** Structured conversations with specific role markers and custom tokens
|
55 |
+
**Data Processing:** Implementation of special tokens <stefan_zweig> and </stefan_zweig> for style consistency
|
56 |
|
57 |
+
- **Training Type:** Two-stage training pipeline
|
58 |
|
59 |
+
1. Supervised Fine-Tuning (SFT)
|
60 |
+
2. Group Relative Policy Optimization (GRPO)
|
61 |
|
62 |
### Framework versions
|
63 |
|
|
|
65 |
- Transformers: 4.48.1
|
66 |
- Pytorch: 2.5.1+cu124
|
67 |
- Datasets: 3.2.0
|
68 |
+
- Tokenizers: 0.21.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|