ainowmk
/

MK-LLM-Mistral

@@ -1,36 +1,37 @@
 ---
 license: apache-2.0
 ---
-README.md for Hugging Face - MK-LLM-Mistral
-This README will help contributors, developers, and AI enthusiasts understand your MK-LLM-Mistral project.
-🚀 MK-LLM-Mistral: The First Macedonian LLM
-📢 MK-LLM-Mistral is the first Macedonian Language Large Language Model 🇲🇰, developed by AI Now - Association for Artificial Intelligence in Macedonia.
-🔗 Website: www.ainow.mk
-📩 Contact: [email protected]
-🛠 GitHub Repository: MK-LLM Project
-📌 Model Overview
-Model Name: MK-LLM-Mistral
-Base Model: Mistral-7B
-Language: Macedonian 🇲🇰
-Fine-tuned on: Wikipedia, news articles, legal documents, and public datasets in Macedonian
-Tasks: Chatbot, Text Completion, Q&A, Macedonian NLP tasks
-📌 How to Use the Model Locally
-1️⃣ Install Required Libraries
-bash
-Copy
-Edit
 pip install transformers torch huggingface_hub
-2️⃣ Load the Model in Python
-python
-Copy
-Edit
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
-# Load Model
 MODEL_NAME = "ainowmk/MK-LLM-Mistral"
 tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
 model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)
@@ -39,37 +40,35 @@ model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)
 device = "cuda" if torch.cuda.is_available() else "cpu"
 model.to(device)
-# Test the Model
 input_text = "Здраво, како си?"
 inputs = tokenizer(input_text, return_tensors="pt").to(device)
 outputs = model.generate(**inputs, max_length=100)
 # Decode and print the result
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 📌 Model Files
 File Name	Description
-pytorch_model.bin	The fine-tuned weights of the model
 config.json	Configuration for the model architecture
 tokenizer.json	Tokenizer used for the Macedonian language
 README.md	Documentation for the model
 .gitattributes	Git LFS tracking for large files
 📌 Training Details
 Dataset: Collected Macedonian texts (Wikipedia, news, government websites)
 Training Compute: GPU-based training on NVIDIA A100
 Training Time: Estimated XX hours
 Fine-tuned using: Hugging Face Transformers & PyTorch
 📌 Contributing
-MK-LLM-Mistral is an open-source project, and contributions are welcome! 🎯
 Open issues on GitHub
 Submit pull requests for improvements
 Join discussions on Hugging Face Community
-💡 If you want to help in data collection, fine-tuning, or evaluation, reach out at [email protected]
-📌 License
-This model is licensed under Apache 2.0.
-You are free to use, distribute, and modify it, but attribution is required.
-🚀 Let’s build the future of Macedonian AI together! 🇲🇰
-👉 AI Now - Association for Artificial Intelligence in Macedonia
-📩 [email protected] | 🔗 www.ainow.mk

 ---
 license: apache-2.0
 ---
+**MK-LLM-Mistral: Open Macedonian Language Model**
+🌍 About This Model
+MK-LLM-Mistral is the **first Macedonian Large Language Model (LLM)**, trained using a fine-tuned version of **Mistral-7B**.
+This project is developed by **AI Now - Association for Artificial Intelligence in Macedonia**.
+📌 Website: [www.ainow.mk](https://www.ainow.mk)
+📩 Contact: [[email protected]](mailto:[email protected])
+🛠 GitHub Repository: [MK-LLM](https://github.com/AI-now-mk/MK-LLM)
+---
+## 📌 Model Details
+- Model Name: MK-LLM-Mistral
+- Base Model: [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B)
+- Language: Macedonian 🇲🇰
+- Fine-tuned on: Wikipedia, news articles, government websites, Macedonian books
+- Tasks: Chatbot, Text Completion, Q&A, Macedonian NLP
+---
+🛠 How to Use This Model
+### 1️⃣ Install Dependencies
+```bash
 pip install transformers torch huggingface_hub
+2️⃣ Load the Model for Inference
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
+# Load the fine-tuned model
 MODEL_NAME = "ainowmk/MK-LLM-Mistral"
 tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
 model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)
 device = "cuda" if torch.cuda.is_available() else "cpu"
 model.to(device)
+# Example prompt in Macedonian
 input_text = "Здраво, како си?"
 inputs = tokenizer(input_text, return_tensors="pt").to(device)
 outputs = model.generate(**inputs, max_length=100)
 # Decode and print the result
+print("\n🧠 Model Output:")
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 📌 Model Files
 File Name	Description
+pytorch_model.bin	The fine-tuned model weights
 config.json	Configuration for the model architecture
 tokenizer.json	Tokenizer used for the Macedonian language
 README.md	Documentation for the model
 .gitattributes	Git LFS tracking for large files
 📌 Training Details
 Dataset: Collected Macedonian texts (Wikipedia, news, government websites)
 Training Compute: GPU-based training on NVIDIA A100
 Training Time: Estimated XX hours
 Fine-tuned using: Hugging Face Transformers & PyTorch
 📌 Contributing
+MK-LLM-Mistral is open-source, and contributions are welcome! 🎯
 Open issues on GitHub
 Submit pull requests for improvements
 Join discussions on Hugging Face Community
+📩 For collaboration, reach out at: [email protected]
+🚀 Let’s build the future of Macedonian AI together! 🇲🇰