tcapelle
/

toxicity-scorer-qwen-ct2

@@ -21,11 +21,11 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.1186
-- F1: 0.9550
-- Accuracy: 0.9567
-- Precision: 0.9538
-- Recall: 0.9567
 ## Model description
@@ -45,13 +45,13 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 3e-05
-- train_batch_size: 24
-- eval_batch_size: 24
 - seed: 42
 - distributed_type: multi-GPU
 - num_devices: 8
-- total_train_batch_size: 192
-- total_eval_batch_size: 192
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.1
@@ -61,10 +61,10 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step  | Validation Loss | F1     | Accuracy | Precision | Recall |
 |:-------------:|:-----:|:-----:|:---------------:|:------:|:--------:|:---------:|:------:|
-| No log        | 0     | 0     | 1.5764          | 0.7709 | 0.6917   | 0.8807    | 0.6917 |
-| 0.1007        | 1.0   | 5878  | 0.0995          | 0.9551 | 0.9582   | 0.9539    | 0.9582 |
-| 0.0756        | 2.0   | 11756 | 0.1020          | 0.9560 | 0.9585   | 0.9548    | 0.9585 |
-| 0.0576        | 3.0   | 17634 | 0.1186          | 0.9550 | 0.9567   | 0.9538    | 0.9567 |
 ### Framework versions

 This model is a fine-tuned version of [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.1188
+- F1: 0.9551
+- Accuracy: 0.9568
+- Precision: 0.9539
+- Recall: 0.9568
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 3e-05
+- train_batch_size: 16
+- eval_batch_size: 16
 - seed: 42
 - distributed_type: multi-GPU
 - num_devices: 8
+- total_train_batch_size: 128
+- total_eval_batch_size: 128
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.1
 | Training Loss | Epoch | Step  | Validation Loss | F1     | Accuracy | Precision | Recall |
 |:-------------:|:-----:|:-----:|:---------------:|:------:|:--------:|:---------:|:------:|
+| No log        | 0     | 0     | 3.7079          | 0.4868 | 0.3651   | 0.8738    | 0.3651 |
+| 0.1046        | 1.0   | 8816  | 0.1006          | 0.9536 | 0.9576   | 0.9526    | 0.9576 |
+| 0.0733        | 2.0   | 17632 | 0.1023          | 0.9558 | 0.9581   | 0.9545    | 0.9581 |
+| 0.0583        | 3.0   | 26448 | 0.1188          | 0.9551 | 0.9568   | 0.9539    | 0.9568 |
 ### Framework versions

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ddda5bd64f9fed0cee5369b7be6c4400d06bbed2ebea732c07d252278132bacd
 size 988106872

 version https://git-lfs.github.com/spec/v1
+oid sha256:3e89ad4cf789337d2efce62a58fab96ade7fe74c29143c0012b1d272374d1b97
 size 988106872