guillaumephd commited on
Commit
01a69c6
·
verified ·
1 Parent(s): 0a5a0f5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -10,7 +10,7 @@ library_name: transformers
10
  # T5-french-base Model
11
 
12
  ## Model Overview
13
- The T5-French-Base model is a T5 model trained solely on French data from the RedPajama 2 dataset.
14
  This model was trained for 85,000 steps and was only pre-trained without any supervised training.
15
  Therefore, this model has to be fine-tuned before it is useable on a downstream task.
16
  It is intended to serve as a foundation for further fine-tuning and as a starting point for downstream tasks in the French language.
@@ -33,7 +33,7 @@ It may be used as a starting point for fine-tuning on tasks such as:
33
  ## Limitations
34
  The T5-French-Base model may not be suitable for user-facing, or production applications.
35
  It is mainly meant for researchers only.
36
- The training budget was really limited (85k steps only, for a final loss of ~1.1).
37
  The model is a base model that hasn't been fine-tuned yet. As such, it does NOT follow instructions.
38
  Additionally, the model was trained solely on French data and won't work for tasks that require cross-lingual understanding or multilingual capabilities.
39
 
 
10
  # T5-french-base Model
11
 
12
  ## Model Overview
13
+ The T5-French-Base model is a ~250M params only T5 model trained solely on French data from the RedPajama 2 dataset.
14
  This model was trained for 85,000 steps and was only pre-trained without any supervised training.
15
  Therefore, this model has to be fine-tuned before it is useable on a downstream task.
16
  It is intended to serve as a foundation for further fine-tuning and as a starting point for downstream tasks in the French language.
 
33
  ## Limitations
34
  The T5-French-Base model may not be suitable for user-facing, or production applications.
35
  It is mainly meant for researchers only.
36
+ The training budget was really limited (85k steps only, ~250M params only, for a final loss of ~1.1).
37
  The model is a base model that hasn't been fine-tuned yet. As such, it does NOT follow instructions.
38
  Additionally, the model was trained solely on French data and won't work for tasks that require cross-lingual understanding or multilingual capabilities.
39