barandinho commited on
Commit
afa0aed
·
verified ·
1 Parent(s): 878cd6d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -13
README.md CHANGED
@@ -64,21 +64,48 @@ Kuantum hesaplama neden önemlidir?<|im_end|>
64
 
65
  ### With `transformers`
66
 
 
 
 
 
67
  ```python
68
- import transformers
 
 
 
 
 
 
69
 
70
- pipeline = transformers.pipeline(
71
- "text-generation",
72
- model="barandinho/phi4-turkish-instruct",
73
- model_kwargs={"torch_dtype": "auto"},
 
74
  device_map="auto",
 
 
 
75
  )
76
 
77
- messages = [
78
- {"role": "system", "content": "Sen yardımsever bir yapay zekasın."},
79
- {"role": "user", "content": "Kuantum hesaplama neden önemlidir?"},
80
- ]
81
-
82
- outputs = pipeline(messages, max_new_tokens=128)
83
- print(outputs[0]["generated_text"][-1])
84
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64
 
65
  ### With `transformers`
66
 
67
+ Below code uses 4-bit quantization (INT4) to run the model more efficiently with lower memory usage, which is especially useful for environments with limited GPU memory like Google Colab. Keep in mind that the model will take some time to download initially.
68
+
69
+ Check [this notebook](https://colab.research.google.com/drive/113RNVTKEx-q7Lg_2V8a7HA-dJIEJiYXI?usp=sharing) for interactive usage of the model.
70
+
71
  ```python
72
+ import os
73
+ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, pipeline
74
+ import torch
75
+
76
+ model_name = "barandinho/phi4-turkish-instruct"
77
+
78
+ quant_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_use_double_quant=True)
79
 
80
+ os.makedirs("offload", exist_ok=True)
81
+
82
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
83
+ model = AutoModelForCausalLM.from_pretrained(
84
+ model_name,
85
  device_map="auto",
86
+ torch_dtype=torch.float16,
87
+ quantization_config=quant_config,
88
+ offload_folder="offload"
89
  )
90
 
91
+ messages = [
92
+ {"role": "system", "content": "Sen yardımsever bir yapay zekasın."},
93
+ {"role": "user", "content": "Kuantum hesaplama neden önemlidir, basit terimlerle açıklayabilir misin?"},
94
+ ]
95
+
96
+ pipe = pipeline(
97
+ "text-generation",
98
+ model=model,
99
+ tokenizer=tokenizer
100
+ )
101
+
102
+ generation_args = {
103
+ "max_new_tokens": 500,
104
+ "return_full_text": False,
105
+ "temperature": 0.0,
106
+ "do_sample": False,
107
+ }
108
+
109
+ output = pipe(messages, **generation_args)
110
+ print(output[0]['generated_text'])
111
+ ```