DmitryYarov commited on
Commit
ff98021
·
verified ·
1 Parent(s): 410a5a0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +83 -6
README.md CHANGED
@@ -20,19 +20,96 @@ It achieves the following results on the evaluation set:
20
  - Loss: 3.0259
21
  - Accuracy: 0.4040
22
 
23
- ## Model description
 
24
 
25
- More information needed
 
26
 
27
- ## Intended uses & limitations
28
 
29
- More information needed
30
 
31
- ## Training and evaluation data
32
 
33
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
  ## Training procedure
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
 
37
  ### Training hyperparameters
38
 
 
20
  - Loss: 3.0259
21
  - Accuracy: 0.4040
22
 
23
+ Model Description
24
+ This model is a fine-tuned version of ai-forever/rugpt3small_based_on_gpt2, designed for causal language modeling tasks. It has been trained on a custom dataset to generate coherent and contextually relevant text.
25
 
26
+ Training Details
27
+ Training Epochs: 29.86
28
 
29
+ Total FLOPs: 8,153,103 GF
30
 
31
+ Training Loss: 3.8147
32
 
33
+ Training Runtime: 35 minutes and 43.75 seconds
34
 
35
+ Number of Training Samples: 291
36
+
37
+ Training Samples per Second: 4.072
38
+
39
+ Training Steps per Second: 0.056
40
+
41
+ Evaluation Metrics
42
+ Evaluation Epoch: 29.86
43
+
44
+ Evaluation Accuracy: 40.4%
45
+
46
+ Evaluation Loss: 3.0259
47
+
48
+ Evaluation Runtime: 0.12 seconds
49
+
50
+ Number of Evaluation Samples: 1
51
+
52
+ Evaluation Samples per Second: 8.08
53
+
54
+ Evaluation Steps per Second: 8.08
55
+
56
+ Perplexity: 20.6125
57
+
58
+ Intended Use
59
+ This model is intended for text generation tasks where coherent and contextually appropriate responses are required. It can be used in applications such as chatbots, content creation, and more.
60
+
61
+ Limitations
62
+ The model has been trained on a limited dataset (291 samples), which may affect its generalization capabilities.
63
+
64
+ The evaluation accuracy of approximately 40% indicates that the model may not perform optimally across all contexts.
65
+
66
+ The perplexity score suggests room for improvement in generating more confident predictions.
67
+
68
+ Future Work
69
+ To enhance the performance of this model, consider the following:
70
+
71
+ Increase the size and diversity of the training dataset.
72
+
73
+ Experiment with additional training epochs or different hyperparameters.
74
+
75
+ Evaluate the model on a broader set of examples to better assess its capabilities.
76
 
77
  ## Training procedure
78
+ ## [Training Procedure](pplx://action/followup)
79
+
80
+ The model was trained using the `transformers` library and the `run_clm.py` script. Here's a summary of the training process:
81
+
82
+ * **[Model](pplx://action/followup):** `ai-forever/rugpt3small_based_on_gpt2` (a Russian language GPT-2 model).
83
+ * **[Objective](pplx://action/followup):** Causal Language Modeling (text generation).
84
+ * **[Hardware](pplx://action/followup):** Google Colab with a single CUDA-enabled GPU.
85
+ * **[Mixed Precision](pplx://action/followup):** FP16 training was enabled to reduce memory footprint and potentially improve training speed.
86
+ * **[Optimizer](pplx://action/followup):** AdamW (`adamw_torch`) was used as the optimizer.
87
+ * **[Learning Rate](pplx://action/followup):** The learning rate was set to `3e-5`.
88
+ * **[Warmup](pplx://action/followup):** A linear warmup schedule with `500` warmup steps was used.
89
+ * **[Training Data](pplx://action/followup):** Custom text dataset loaded from `
90
+ The model was trained on a custom text dataset loaded from the following sources using the `plain_text` dataset configuration:
91
+
92
+ * **Training set:** Aristotle's major works. (32,835 examples)
93
+ * Аристотель. Категории
94
+ * Аристотель. Никомахова этика
95
+ * Аристотель. Физика
96
+ * Аристотель. Метафизика
97
+ * Аристотель. Риторика
98
+ * Аристотель. Поэтика
99
+ * **[Validation Data](pplx://action/followup):** Custom text dataset loaded from `- Аристотель. Никомахова этика ttps://lib.ru/POEEAST/ARISTOTEL/nikomah.txt` using the `plain_text` dataset configuration. The validation set contained 111 examples.
100
+
101
+ * **Validation set:** Aristotle. Никомахова этика (111 examples)
102
+
103
+
104
+ *
105
+ * **[Batch Size](pplx://action/followup):** A per-device batch size of `8` was used with a gradient accumulation size of `8`, resulting in an effective batch size of 64.
106
+ * **[Sequence Length](pplx://action/followup):** The maximum sequence length (block size) was set to `2048`.
107
+ * **[Gradient Checkpointing](pplx://action/followup):** Enabled to reduce memory consumption.
108
+ * **[Epochs](pplx://action/followup):** Trained for `30` epochs. The training data was passed over 30 times.
109
+ * **[Evaluation](pplx://action/followup):** Evaluation was performed every `1000` steps using the validation dataset.
110
+ * **[Logging](pplx://action/followup):** Training progress and metrics were logged every `100` steps to TensorBoard and Weights & Biases (WandB).
111
+ * **[Checkpoints](pplx://action/followup):** Model checkpoints were saved every `1000` steps, with a limit of `3` saved checkpoints.
112
+
113
 
114
  ### Training hyperparameters
115