ainow-mk commited on
Commit
2756a66
·
verified ·
1 Parent(s): 4bf8b85

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +16 -75
README.md CHANGED
@@ -1,75 +1,16 @@
1
- ---
2
- language: mk
3
- tags:
4
- - macedonian
5
- - mistral
6
- - llm
7
- - nlp
8
- - text-generation
9
- license: apache-2.0
10
- datasets:
11
- - macedonian-wikipedia
12
- - news-articles
13
- - books
14
- metrics:
15
- - perplexity
16
- - bleu
17
- - rouge
18
- - accuracy
19
- ---
20
-
21
- # MK-LLM-Mistral: Fine-Tuned Macedonian Language Model
22
-
23
- ## 🌍 Overview
24
- **MK-LLM-Mistral** is a **fine-tuned Macedonian language model**, built to enhance **text generation, comprehension, and NLP capabilities** in the Macedonian language.
25
- This model is developed by **AI Now - Association for Artificial Intelligence in Macedonia** as part of the **MK-LLM initiative**, Macedonia's first open-source LLM project.
26
-
27
- 📌 **Website:** [www.ainow.mk](https://www.ainow.mk)
28
- 📩 **Contact:** [[email protected]](mailto:[email protected])
29
- 🛠 **GitHub Repository:** [MK-LLM](https://github.com/AI-now-mk/MK-LLM)
30
-
31
- ---
32
-
33
- ## 📌 Model Details
34
- - **Architecture:** Fine-tuned **Mistral 7B**
35
- - **Language:** Macedonian 🇲🇰
36
- - **Training Data:** Macedonian Wikipedia, news articles, books, and open-source datasets
37
- - **Tokenization:** Custom Macedonian tokenization
38
- - **Framework:** [Hugging Face Transformers](https://huggingface.co/docs/transformers/index)
39
- - **Model Type:** Causal Language Model (CLM)
40
-
41
- ---
42
-
43
- ## 🎯 Intended Use
44
- This model is optimized for **Macedonian NLP tasks**, including:
45
- ✅ **Text Generation** – Macedonian text continuation and creative writing
46
- ✅ **Summarization** – Extracting key points from Macedonian documents
47
- ✅ **Question Answering** – Responding to Macedonian-language queries
48
- ✅ **Chatbots & Virtual Assistants** – Enhancing automated Macedonian-language interactions
49
-
50
- ---
51
-
52
- ## ⚠️ Limitations & Ethical Considerations
53
- ⚠️ This model **may not always be accurate** and could generate **biased or misleading** responses. It is recommended to:
54
- - **Validate outputs** before using them in real-world applications.
55
- - **Avoid using for critical decision-making** (e.g., legal, medical, financial).
56
- - **Improve it further** with domain-specific fine-tuning.
57
-
58
- ---
59
-
60
- ## 🚀 How to Use the Model
61
- You can load and run the model using **Hugging Face Transformers** in Python:
62
-
63
- ### **🔹 Load the Model for Inference**
64
- ```python
65
- from transformers import AutoModelForCausalLM, AutoTokenizer
66
-
67
- model_name = "ainowmk/MK-LLM-Mistral"
68
- tokenizer = AutoTokenizer.from_pretrained(model_name)
69
- model = AutoModelForCausalLM.from_pretrained(model_name)
70
-
71
- input_text = "Која е главната цел на вештачката интелигенција?"
72
- inputs = tokenizer(input_text, return_tensors="pt")
73
- output = model.generate(**inputs, max_length=50)
74
-
75
- print(tokenizer.decode(output[0], skip_special_tokens=True))
 
1
+ # MK-LLM Model
2
+
3
+ Macedonian Language Model based on Mistral architecture.
4
+
5
+ ## Usage
6
+ ```python
7
+ from transformers import AutoModelForCausalLM, AutoTokenizer
8
+
9
+ model = AutoModelForCausalLM.from_pretrained("ainowmk/MK-LLM-Mistral")
10
+ tokenizer = AutoTokenizer.from_pretrained("ainowmk/MK-LLM-Mistral")
11
+
12
+ text = "Здраво, како си?"
13
+ inputs = tokenizer(text, return_tensors="pt")
14
+ outputs = model.generate(**inputs)
15
+ print(tokenizer.decode(outputs[0]))
16
+ ```