ainow-mk commited on
Commit
a0b5d18
·
verified ·
1 Parent(s): 6462828

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -55
README.md CHANGED
@@ -1,74 +1,75 @@
1
  ---
 
 
 
 
 
 
 
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
3
  ---
4
- **MK-LLM-Mistral: Open Macedonian Language Model**
5
 
6
- 🌍 About This Model
7
- MK-LLM-Mistral is the **first Macedonian Large Language Model (LLM)**, trained using a fine-tuned version of **Mistral-7B**.
8
- This project is developed by **AI Now - Association for Artificial Intelligence in Macedonia**.
9
 
10
- 📌 Website: [www.ainow.mk](https://www.ainow.mk)
11
- 📩 Contact: [contact@ainow.mk](mailto:[email protected])
12
- 🛠 GitHub Repository: [MK-LLM](https://github.com/AI-now-mk/MK-LLM)
13
 
14
- ---
15
-
16
- ## 📌 Model Details
17
- - Model Name: MK-LLM-Mistral
18
- - Base Model: [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B)
19
- - Language: Macedonian 🇲🇰
20
- - Fine-tuned on: Wikipedia, news articles, government websites, Macedonian books
21
- - Tasks: Chatbot, Text Completion, Q&A, Macedonian NLP
22
 
23
  ---
24
 
25
- 🛠 How to Use This Model
26
- ### 1️⃣ Install Dependencies
27
- ```bash
28
- pip install transformers torch huggingface_hub
 
 
 
29
 
30
- 2️⃣ Load the Model for Inference
31
- from transformers import AutoModelForCausalLM, AutoTokenizer
32
- import torch
33
 
34
- # Load the fine-tuned model
35
- MODEL_NAME = "ainowmk/MK-LLM-Mistral"
36
- tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
37
- model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)
 
 
38
 
39
- # Move model to GPU if available
40
- device = "cuda" if torch.cuda.is_available() else "cpu"
41
- model.to(device)
42
 
43
- # Example prompt in Macedonian
44
- input_text = "Здраво, како си?"
45
- inputs = tokenizer(input_text, return_tensors="pt").to(device)
46
- outputs = model.generate(**inputs, max_length=100)
 
47
 
48
- # Decode and print the result
49
- print("\n🧠 Model Output:")
50
- print(tokenizer.decode(outputs[0], skip_special_tokens=True))
51
 
52
- 📌 Model Files
53
- File Name Description
54
- pytorch_model.bin The fine-tuned model weights
55
- config.json Configuration for the model architecture
56
- tokenizer.json Tokenizer used for the Macedonian language
57
- README.md Documentation for the model
58
- .gitattributes Git LFS tracking for large files
59
 
60
- 📌 Training Details
61
- Dataset: Collected Macedonian texts (Wikipedia, news, government websites)
62
- Training Compute: GPU-based training on NVIDIA A100
63
- Training Time: Estimated XX hours
64
- Fine-tuned using: Hugging Face Transformers & PyTorch
65
 
66
- 📌 Contributing
67
- MK-LLM-Mistral is open-source, and contributions are welcome! 🎯
 
68
 
69
- Open issues on GitHub
70
- Submit pull requests for improvements
71
- Join discussions on Hugging Face Community
72
- 📩 For collaboration, reach out at: [email protected]
73
 
74
- 🚀 Let’s build the future of Macedonian AI together! 🇲🇰
 
1
  ---
2
+ language: mk
3
+ tags:
4
+ - macedonian
5
+ - mistral
6
+ - llm
7
+ - nlp
8
+ - text-generation
9
  license: apache-2.0
10
+ datasets:
11
+ - macedonian-wikipedia
12
+ - news-articles
13
+ - books
14
+ metrics:
15
+ - perplexity
16
+ - bleu
17
+ - rouge
18
+ - accuracy
19
  ---
 
20
 
21
+ # MK-LLM-Mistral: Fine-Tuned Macedonian Language Model
 
 
22
 
23
+ ## 🌍 Overview
24
+ **MK-LLM-Mistral** is a **fine-tuned Macedonian language model**, built to enhance **text generation, comprehension, and NLP capabilities** in the Macedonian language.
25
+ This model is developed by **AI Now - Association for Artificial Intelligence in Macedonia** as part of the **MK-LLM initiative**, Macedonia's first open-source LLM project.
26
 
27
+ 📌 **Website:** [www.ainow.mk](https://www.ainow.mk)
28
+ 📩 **Contact:** [[email protected]](mailto:[email protected])
29
+ 🛠 **GitHub Repository:** [MK-LLM](https://github.com/AI-now-mk/MK-LLM)
 
 
 
 
 
30
 
31
  ---
32
 
33
+ ## 📌 Model Details
34
+ - **Architecture:** Fine-tuned **Mistral 7B**
35
+ - **Language:** Macedonian 🇲🇰
36
+ - **Training Data:** Macedonian Wikipedia, news articles, books, and open-source datasets
37
+ - **Tokenization:** Custom Macedonian tokenization
38
+ - **Framework:** [Hugging Face Transformers](https://huggingface.co/docs/transformers/index)
39
+ - **Model Type:** Causal Language Model (CLM)
40
 
41
+ ---
 
 
42
 
43
+ ## 🎯 Intended Use
44
+ This model is optimized for **Macedonian NLP tasks**, including:
45
+ **Text Generation** – Macedonian text continuation and creative writing
46
+ **Summarization** – Extracting key points from Macedonian documents
47
+ ✅ **Question Answering** – Responding to Macedonian-language queries
48
+ ✅ **Chatbots & Virtual Assistants** – Enhancing automated Macedonian-language interactions
49
 
50
+ ---
 
 
51
 
52
+ ## ⚠️ Limitations & Ethical Considerations
53
+ ⚠️ This model **may not always be accurate** and could generate **biased or misleading** responses. It is recommended to:
54
+ - **Validate outputs** before using them in real-world applications.
55
+ - **Avoid using for critical decision-making** (e.g., legal, medical, financial).
56
+ - **Improve it further** with domain-specific fine-tuning.
57
 
58
+ ---
 
 
59
 
60
+ ## 🚀 How to Use the Model
61
+ You can load and run the model using **Hugging Face Transformers** in Python:
 
 
 
 
 
62
 
63
+ ### **🔹 Load the Model for Inference**
64
+ ```python
65
+ from transformers import AutoModelForCausalLM, AutoTokenizer
 
 
66
 
67
+ model_name = "ainowmk/MK-LLM-Mistral"
68
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
69
+ model = AutoModelForCausalLM.from_pretrained(model_name)
70
 
71
+ input_text = "Која е главната цел на вештачката интелигенција?"
72
+ inputs = tokenizer(input_text, return_tensors="pt")
73
+ output = model.generate(**inputs, max_length=50)
 
74
 
75
+ print(tokenizer.decode(output[0], skip_special_tokens=True))