MK-LLM-Mistral / README.md
ainow-mk's picture
Update README.md
a0b5d18 verified
|
raw
history blame
2.72 kB
metadata
language: mk
tags:
  - macedonian
  - mistral
  - llm
  - nlp
  - text-generation
license: apache-2.0
datasets:
  - macedonian-wikipedia
  - news-articles
  - books
metrics:
  - perplexity
  - bleu
  - rouge
  - accuracy

MK-LLM-Mistral: Fine-Tuned Macedonian Language Model

🌍 Overview

MK-LLM-Mistral is a fine-tuned Macedonian language model, built to enhance text generation, comprehension, and NLP capabilities in the Macedonian language.
This model is developed by AI Now - Association for Artificial Intelligence in Macedonia as part of the MK-LLM initiative, Macedonia's first open-source LLM project.

📌 Website: www.ainow.mk
📩 Contact: [email protected]
🛠 GitHub Repository: MK-LLM


📌 Model Details

  • Architecture: Fine-tuned Mistral 7B
  • Language: Macedonian 🇲🇰
  • Training Data: Macedonian Wikipedia, news articles, books, and open-source datasets
  • Tokenization: Custom Macedonian tokenization
  • Framework: Hugging Face Transformers
  • Model Type: Causal Language Model (CLM)

🎯 Intended Use

This model is optimized for Macedonian NLP tasks, including:
Text Generation – Macedonian text continuation and creative writing
Summarization – Extracting key points from Macedonian documents
Question Answering – Responding to Macedonian-language queries
Chatbots & Virtual Assistants – Enhancing automated Macedonian-language interactions


⚠️ Limitations & Ethical Considerations

⚠️ This model may not always be accurate and could generate biased or misleading responses. It is recommended to:

  • Validate outputs before using them in real-world applications.
  • Avoid using for critical decision-making (e.g., legal, medical, financial).
  • Improve it further with domain-specific fine-tuning.

🚀 How to Use the Model

You can load and run the model using Hugging Face Transformers in Python:

🔹 Load the Model for Inference

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "ainowmk/MK-LLM-Mistral"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

input_text = "Која е главната цел на вештачката интелигенција?"
inputs = tokenizer(input_text, return_tensors="pt")
output = model.generate(**inputs, max_length=50)

print(tokenizer.decode(output[0], skip_special_tokens=True))