Fine-tuned mt5-base model for restoring capitalization and punctuation for Macedonian language
The model is fine-tuned on a subset of the Macedonian portion of Wikipedia.
Authors:
- Dejan Porjazovski
- Ilina Jakimovska
- Ordan Chukaliev
- Nikola Stikov
This collaboration is part of the activities of the Center for Advanced Interdisciplinary Research (CAIR) at UKIM.
Usage
pip install transformers
import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
recap_model_name = "Macedonian-ASR/mt5-restore-capitalization-macedonian"
recap_tokenizer = T5Tokenizer.from_pretrained(recap_model_name)
recap_model = T5ForConditionalGeneration.from_pretrained(recap_model_name)
recap_model.to(device)
sentence = "скопје е главен град на македонија"
inputs = recap_tokenizer(["restore capitalization and punctuation: " + sentence], return_tensors="pt", padding=True).to(device)
outputs = recap_model.generate(**inputs, max_length=768, num_beams=5, early_stopping=True).squeeze(0)
recap_result = recap_tokenizer.decode(outputs, skip_special_tokens=True)
print(recap_result)
-> "Скопје е главен град на Македонија."
- Downloads last month
- 672
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.
Model tree for Macedonian-ASR/mt5-restore-capitalization-macedonian
Base model
google/mt5-base