Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -64,21 +64,48 @@ Kuantum hesaplama neden önemlidir?<|im_end|>
 ### With `transformers`
 ```python
-import transformers
-pipeline = transformers.pipeline(
-    "text-generation",
-    model="barandinho/phi4-turkish-instruct",
-    model_kwargs={"torch_dtype": "auto"},
     device_map="auto",
 )
-messages = [
-    {"role": "system", "content": "Sen yardımsever bir yapay zekasın."},
-    {"role": "user", "content": "Kuantum hesaplama neden önemlidir?"},
-]
-outputs = pipeline(messages, max_new_tokens=128)
-print(outputs[0]["generated_text"][-1])
-```

 ### With `transformers`
+Below code uses 4-bit quantization (INT4) to run the model more efficiently with lower memory usage, which is especially useful for environments with limited GPU memory like Google Colab. Keep in mind that the model will take some time to download initially.
+Check [this notebook](https://colab.research.google.com/drive/113RNVTKEx-q7Lg_2V8a7HA-dJIEJiYXI?usp=sharing) for interactive usage of the model.
 ```python
+import os
+from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, pipeline
+import torch
+model_name = "barandinho/phi4-turkish-instruct"
+quant_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_use_double_quant=True)
+os.makedirs("offload", exist_ok=True)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
     device_map="auto",
+    torch_dtype=torch.float16,
+    quantization_config=quant_config,
+    offload_folder="offload"
 )
+messages = [
+    {"role": "system", "content": "Sen yardımsever bir yapay zekasın."},
+    {"role": "user", "content": "Kuantum hesaplama neden önemlidir, basit terimlerle açıklayabilir misin?"},
+]
+pipe = pipeline(
+    "text-generation",
+    model=model,
+    tokenizer=tokenizer
+)
+generation_args = {
+    "max_new_tokens": 500,
+    "return_full_text": False,
+    "temperature": 0.0,
+    "do_sample": False,
+}
+output = pipe(messages, **generation_args)
+print(output[0]['generated_text'])
+```