guillaumephd
/

t5-french-base

Text2Text Generation

text-generation-inference

Model card Files Files and versions Community

guillaumephd commited on May 17, 2024

Commit

01a69c6

·

verified ·

1 Parent(s): 0a5a0f5

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -10,7 +10,7 @@ library_name: transformers
 # T5-french-base Model
 ## Model Overview
-The T5-French-Base model is a T5 model trained solely on French data from the RedPajama 2 dataset.
 This model was trained for 85,000 steps and was only pre-trained without any supervised training.
 Therefore, this model has to be fine-tuned before it is useable on a downstream task.
 It is intended to serve as a foundation for further fine-tuning and as a starting point for downstream tasks in the French language.
@@ -33,7 +33,7 @@ It may be used as a starting point for fine-tuning on tasks such as:
 ## Limitations
 The T5-French-Base model may not be suitable for user-facing, or production applications.
 It is mainly meant for researchers only.
-The training budget was really limited (85k steps only, for a final loss of ~1.1).
 The model is a base model that hasn't been fine-tuned yet. As such, it does NOT follow instructions.
 Additionally, the model was trained solely on French data and won't work for tasks that require cross-lingual understanding or multilingual capabilities.

 # T5-french-base Model
 ## Model Overview
+The T5-French-Base model is a ~250M params only T5 model trained solely on French data from the RedPajama 2 dataset.
 This model was trained for 85,000 steps and was only pre-trained without any supervised training.
 Therefore, this model has to be fine-tuned before it is useable on a downstream task.
 It is intended to serve as a foundation for further fine-tuning and as a starting point for downstream tasks in the French language.
 ## Limitations
 The T5-French-Base model may not be suitable for user-facing, or production applications.
 It is mainly meant for researchers only.
+The training budget was really limited (85k steps only, ~250M params only, for a final loss of ~1.1).
 The model is a base model that hasn't been fine-tuned yet. As such, it does NOT follow instructions.
 Additionally, the model was trained solely on French data and won't work for tasks that require cross-lingual understanding or multilingual capabilities.