Chan-Y
/

Stefan-Zweig-Granite-2B

@@ -1,36 +1,63 @@
 ---
 base_model: ibm-granite/granite-3.1-2b-instruct
 library_name: transformers
-model_name: zweig_granite_model_2501
 tags:
 - generated_from_trainer
 - trl
 - sft
-licence: license
 ---
-# Model Card for zweig_granite_model_2501
 This model is a fine-tuned version of [ibm-granite/granite-3.1-2b-instruct](https://huggingface.co/ibm-granite/granite-3.1-2b-instruct).
 It has been trained using [TRL](https://github.com/huggingface/trl).
 ## Quick start
 ```python
-from transformers import pipeline
-question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
-generator = pipeline("text-generation", model="Chan-Y/zweig_granite_model_2501", device="cuda")
-output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
-print(output["generated_text"])
 ```
 ## Training procedure
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/cihan/huggingface/runs/301tbirg)
-This model was trained with SFT.
 ### Framework versions
@@ -38,21 +65,4 @@ This model was trained with SFT.
 - Transformers: 4.48.1
 - Pytorch: 2.5.1+cu124
 - Datasets: 3.2.0
-- Tokenizers: 0.21.0
-## Citations
-Cite TRL as:
-```bibtex
-@misc{vonwerra2022trl,
-	title        = {{TRL: Transformer Reinforcement Learning}},
-	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin GallouÃ©dec},
-	year         = 2020,
-	journal      = {GitHub repository},
-	publisher    = {GitHub},
-	howpublished = {\url{https://github.com/huggingface/trl}}
-}
-```

 ---
 base_model: ibm-granite/granite-3.1-2b-instruct
 library_name: transformers
+model_name: Stefan-Zweig-Granite-2B
 tags:
 - generated_from_trainer
 - trl
 - sft
+datasets:
+- Chan-Y/Stefan-Zweig-Chat
 ---
+# Model Card for Stefan Zweig Language Model
 This model is a fine-tuned version of [ibm-granite/granite-3.1-2b-instruct](https://huggingface.co/ibm-granite/granite-3.1-2b-instruct).
 It has been trained using [TRL](https://github.com/huggingface/trl).
+## Model Details
+This model is designed to emulate Stefan Zweig's distinctive writing and conversational style in chat format.
+Used a fine-tuning approach based on the methodology described in the DeepSeek-V3 technical report. The project aims to create a language model that emulates Stefan Zweig's distinctive writing and conversational style using a two-stage training process: Supervised Fine-Tuning (SFT) followed by Group Relative Policy Optimization (GRPO).
 ## Quick start
 ```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+device = "cuda" if torch.cuda.is_available() else "cpu"
+model = AutoModelForCausalLM.from_pretrained("Chan-Y/Stefan-Zweig-Granite", device_map=device)
+tokenizer = AutoTokenizer.from_pretrained("Chan-Y/Stefan-Zweig-Granite")
+input_text = "As an experienced and famous writer Stefan Zweig, what's your opinion on artificial intelligence?"
+inputs = tokenizer(input_text, return_tensors="pt").to(device)
+with torch.no_grad():
+  outputs = model.generate(
+    **inputs,
+    max_length=512,
+    num_return_sequences=1,
+    do_sample=True,
+    temperature=0.7,
+    top_p=0.9,
+  )
+# Decode the generated text
+generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(generated_text.split(input_text)[-1])
 ```
 ## Training procedure
+![train-loss](train-loss.png)
+**Dataset:** Custom synthetic dataset generated using argilla/synthetic-data-generator with Qwen2.5:14b
+**Data Format:** Structured conversations with specific role markers and custom tokens
+**Data Processing:** Implementation of special tokens <stefan_zweig> and </stefan_zweig> for style consistency
+- **Training Type:** Two-stage training pipeline
+  1. Supervised Fine-Tuning (SFT)
+  2. Group Relative Policy Optimization (GRPO)
 ### Framework versions
 - Transformers: 4.48.1
 - Pytorch: 2.5.1+cu124
 - Datasets: 3.2.0
+- Tokenizers: 0.21.0