Update README.md
Browse files
README.md
CHANGED
@@ -6,13 +6,14 @@ tags: []
|
|
6 |
# Model Card for mbart-large-50-verbalization
|
7 |
|
8 |
## Model Description
|
9 |
-
`mbart-large-50-verbalization` is a fine-tuned version of the
|
10 |
|
11 |
## Architecture
|
12 |
-
This model is based on the
|
13 |
|
14 |
## Training Data
|
15 |
The model was fine-tuned on a subset of 96,780 sentences from the Ubertext dataset, focusing on news content. The verbalized equivalents were created using Google Gemini Pro, providing a rich basis for learning text transformation tasks.
|
|
|
16 |
|
17 |
## Training Procedure
|
18 |
The model underwent nearly 70,000 training steps, amounting to almost 2 epochs, to ensure thorough learning from the training dataset.
|
|
|
6 |
# Model Card for mbart-large-50-verbalization
|
7 |
|
8 |
## Model Description
|
9 |
+
`mbart-large-50-verbalization` is a fine-tuned version of the [facebook/mbart-large-50](https://huggingface.co/facebook/mbart-large-50) model, specifically designed for the task of verbalizing Ukrainian text to prepare it for Text-to-Speech (TTS) systems. This model aims to transform structured data like numbers and dates into their fully expanded textual representations in Ukrainian.
|
10 |
|
11 |
## Architecture
|
12 |
+
This model is based on the [facebook/mbart-large-50](https://huggingface.co/facebook/mbart-large-50) architecture, renowned for its effectiveness in translation and text generation tasks across numerous languages.
|
13 |
|
14 |
## Training Data
|
15 |
The model was fine-tuned on a subset of 96,780 sentences from the Ubertext dataset, focusing on news content. The verbalized equivalents were created using Google Gemini Pro, providing a rich basis for learning text transformation tasks.
|
16 |
+
Dataset [skypro1111/ubertext-2-news-verbalized](https://huggingface.co/datasets/skypro1111/ubertext-2-news-verbalized)
|
17 |
|
18 |
## Training Procedure
|
19 |
The model underwent nearly 70,000 training steps, amounting to almost 2 epochs, to ensure thorough learning from the training dataset.
|