ainow-mk commited on
Commit
6462828
Β·
verified Β·
1 Parent(s): d8d0d92

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -37
README.md CHANGED
@@ -1,36 +1,37 @@
1
  ---
2
  license: apache-2.0
3
  ---
4
- README.md for Hugging Face - MK-LLM-Mistral
5
- This README will help contributors, developers, and AI enthusiasts understand your MK-LLM-Mistral project.
6
-
7
- πŸš€ MK-LLM-Mistral: The First Macedonian LLM
8
- πŸ“’ MK-LLM-Mistral is the first Macedonian Language Large Language Model πŸ‡²πŸ‡°, developed by AI Now - Association for Artificial Intelligence in Macedonia.
9
-
10
- πŸ”— Website: www.ainow.mk
11
- πŸ“© Contact: [email protected]
12
- πŸ›  GitHub Repository: MK-LLM Project
13
-
14
- πŸ“Œ Model Overview
15
- Model Name: MK-LLM-Mistral
16
- Base Model: Mistral-7B
17
- Language: Macedonian πŸ‡²πŸ‡°
18
- Fine-tuned on: Wikipedia, news articles, legal documents, and public datasets in Macedonian
19
- Tasks: Chatbot, Text Completion, Q&A, Macedonian NLP tasks
20
- πŸ“Œ How to Use the Model Locally
21
- 1️⃣ Install Required Libraries
22
- bash
23
- Copy
24
- Edit
 
 
 
25
  pip install transformers torch huggingface_hub
26
- 2️⃣ Load the Model in Python
27
- python
28
- Copy
29
- Edit
30
  from transformers import AutoModelForCausalLM, AutoTokenizer
31
  import torch
32
 
33
- # Load Model
34
  MODEL_NAME = "ainowmk/MK-LLM-Mistral"
35
  tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
36
  model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)
@@ -39,37 +40,35 @@ model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)
39
  device = "cuda" if torch.cuda.is_available() else "cpu"
40
  model.to(device)
41
 
42
- # Test the Model
43
  input_text = "Π—Π΄Ρ€Π°Π²ΠΎ, ΠΊΠ°ΠΊΠΎ си?"
44
  inputs = tokenizer(input_text, return_tensors="pt").to(device)
45
  outputs = model.generate(**inputs, max_length=100)
46
 
47
  # Decode and print the result
 
48
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 
49
  πŸ“Œ Model Files
50
  File Name Description
51
- pytorch_model.bin The fine-tuned weights of the model
52
  config.json Configuration for the model architecture
53
  tokenizer.json Tokenizer used for the Macedonian language
54
  README.md Documentation for the model
55
  .gitattributes Git LFS tracking for large files
 
56
  πŸ“Œ Training Details
57
  Dataset: Collected Macedonian texts (Wikipedia, news, government websites)
58
  Training Compute: GPU-based training on NVIDIA A100
59
  Training Time: Estimated XX hours
60
  Fine-tuned using: Hugging Face Transformers & PyTorch
 
61
  πŸ“Œ Contributing
62
- MK-LLM-Mistral is an open-source project, and contributions are welcome! 🎯
63
 
64
  Open issues on GitHub
65
  Submit pull requests for improvements
66
  Join discussions on Hugging Face Community
67
- πŸ’‘ If you want to help in data collection, fine-tuning, or evaluation, reach out at [email protected]
68
-
69
- πŸ“Œ License
70
- This model is licensed under Apache 2.0.
71
- You are free to use, distribute, and modify it, but attribution is required.
72
 
73
- πŸš€ Let’s build the future of Macedonian AI together! πŸ‡²πŸ‡°
74
- πŸ‘‰ AI Now - Association for Artificial Intelligence in Macedonia
75
- πŸ“© [email protected] | πŸ”— www.ainow.mk
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ **MK-LLM-Mistral: Open Macedonian Language Model**
5
+
6
+ 🌍 About This Model
7
+ MK-LLM-Mistral is the **first Macedonian Large Language Model (LLM)**, trained using a fine-tuned version of **Mistral-7B**.
8
+ This project is developed by **AI Now - Association for Artificial Intelligence in Macedonia**.
9
+
10
+ πŸ“Œ Website: [www.ainow.mk](https://www.ainow.mk)
11
+ πŸ“© Contact: [[email protected]](mailto:[email protected])
12
+ πŸ›  GitHub Repository: [MK-LLM](https://github.com/AI-now-mk/MK-LLM)
13
+
14
+ ---
15
+
16
+ ## πŸ“Œ Model Details
17
+ - Model Name: MK-LLM-Mistral
18
+ - Base Model: [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B)
19
+ - Language: Macedonian πŸ‡²πŸ‡°
20
+ - Fine-tuned on: Wikipedia, news articles, government websites, Macedonian books
21
+ - Tasks: Chatbot, Text Completion, Q&A, Macedonian NLP
22
+
23
+ ---
24
+
25
+ πŸ›  How to Use This Model
26
+ ### 1️⃣ Install Dependencies
27
+ ```bash
28
  pip install transformers torch huggingface_hub
29
+
30
+ 2️⃣ Load the Model for Inference
 
 
31
  from transformers import AutoModelForCausalLM, AutoTokenizer
32
  import torch
33
 
34
+ # Load the fine-tuned model
35
  MODEL_NAME = "ainowmk/MK-LLM-Mistral"
36
  tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
37
  model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)
 
40
  device = "cuda" if torch.cuda.is_available() else "cpu"
41
  model.to(device)
42
 
43
+ # Example prompt in Macedonian
44
  input_text = "Π—Π΄Ρ€Π°Π²ΠΎ, ΠΊΠ°ΠΊΠΎ си?"
45
  inputs = tokenizer(input_text, return_tensors="pt").to(device)
46
  outputs = model.generate(**inputs, max_length=100)
47
 
48
  # Decode and print the result
49
+ print("\n🧠 Model Output:")
50
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
51
+
52
  πŸ“Œ Model Files
53
  File Name Description
54
+ pytorch_model.bin The fine-tuned model weights
55
  config.json Configuration for the model architecture
56
  tokenizer.json Tokenizer used for the Macedonian language
57
  README.md Documentation for the model
58
  .gitattributes Git LFS tracking for large files
59
+
60
  πŸ“Œ Training Details
61
  Dataset: Collected Macedonian texts (Wikipedia, news, government websites)
62
  Training Compute: GPU-based training on NVIDIA A100
63
  Training Time: Estimated XX hours
64
  Fine-tuned using: Hugging Face Transformers & PyTorch
65
+
66
  πŸ“Œ Contributing
67
+ MK-LLM-Mistral is open-source, and contributions are welcome! 🎯
68
 
69
  Open issues on GitHub
70
  Submit pull requests for improvements
71
  Join discussions on Hugging Face Community
72
+ πŸ“© For collaboration, reach out at: [email protected]
 
 
 
 
73
 
74
+ πŸš€ Let’s build the future of Macedonian AI together! πŸ‡²πŸ‡°