mjschock
/

TinyLlama-1.1B-Chat-v1.0-sft-chat_threads

PEFT

Safetensors

trl

sft

Generated from Trainer

Model card Files Files and versions Community

mjschock commited on Nov 9, 2024

Commit

49474e2

verified ·

1 Parent(s): ba76b20

Model save

Browse files

Files changed (1) hide show

README.md +16 -14

README.md CHANGED Viewed

@@ -18,20 +18,20 @@ should probably proofread and complete it, then remove this comment. -->
 # TinyLlama-1.1B-Chat-v1.0-sft-chat_threads
-This model is a fine-tuned version of [mjschock/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/mjschock/TinyLlama-1.1B-Chat-v1.0) on the mjschock/chat_threads dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.7796
-- Bleu: 0.6746
-- Precisions: 0.6878
-- Brevity Penalty: 0.9963
-- Length Ratio: 0.9965
-- Translation Length: 580.7962
 - Reference Length: 582.9104
-- Meteor: 0.6913
-- Rouge1: 0.7195
-- Rouge2: 0.4376
-- Rougel: 0.6323
-- Rougelsum: 0.7091
 ## Model description
@@ -58,14 +58,16 @@ The following hyperparameters were used during training:
 - total_train_batch_size: 16
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
-- num_epochs: 1.0
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss | Bleu   | Precisions | Brevity Penalty | Length Ratio | Translation Length | Reference Length | Meteor | Rouge1 | Rouge2 | Rougel | Rougelsum |
 |:-------------:|:------:|:----:|:---------------:|:------:|:----------:|:---------------:|:------------:|:------------------:|:----------------:|:------:|:------:|:------:|:------:|:---------:|
 | No log        | 0      | 0    | 0.8976          | 0.6391 | 0.6567     | 0.9934          | 0.9936       | 579.7720           | 582.9104         | 0.6775 | 0.6912 | 0.3881 | 0.5809 | 0.6813    |
-| 0.821         | 0.9630 | 13   | 0.7796          | 0.6746 | 0.6878     | 0.9963          | 0.9965       | 580.7962           | 582.9104         | 0.6913 | 0.7195 | 0.4376 | 0.6323 | 0.7091    |
 ### Framework versions

 # TinyLlama-1.1B-Chat-v1.0-sft-chat_threads
+This model is a fine-tuned version of [mjschock/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/mjschock/TinyLlama-1.1B-Chat-v1.0) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.5567
+- Bleu: 0.7600
+- Precisions: 0.7673
+- Brevity Penalty: 0.9978
+- Length Ratio: 0.9985
+- Translation Length: 582.4604
 - Reference Length: 582.9104
+- Meteor: 0.7381
+- Rouge1: 0.7955
+- Rouge2: 0.5621
+- Rougel: 0.7316
+- Rougelsum: 0.7895
 ## Model description
 - total_train_batch_size: 16
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
+- num_epochs: 3.0
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss | Bleu   | Precisions | Brevity Penalty | Length Ratio | Translation Length | Reference Length | Meteor | Rouge1 | Rouge2 | Rougel | Rougelsum |
 |:-------------:|:------:|:----:|:---------------:|:------:|:----------:|:---------------:|:------------:|:------------------:|:----------------:|:------:|:------:|:------:|:------:|:---------:|
 | No log        | 0      | 0    | 0.8976          | 0.6391 | 0.6567     | 0.9934          | 0.9936       | 579.7720           | 582.9104         | 0.6775 | 0.6912 | 0.3881 | 0.5809 | 0.6813    |
+| 0.7597        | 0.9630 | 13   | 0.7155          | 0.6980 | 0.7093     | 0.9966          | 0.9968       | 581.1118           | 582.9104         | 0.7011 | 0.7382 | 0.4695 | 0.6634 | 0.7289    |
+| 0.6302        | 2.0    | 27   | 0.5975          | 0.7435 | 0.7516     | 0.9977          | 0.9977       | 581.8998           | 582.9104         | 0.7320 | 0.7835 | 0.5336 | 0.7091 | 0.7776    |
+| 0.5719        | 2.8889 | 39   | 0.5567          | 0.7600 | 0.7673     | 0.9978          | 0.9985       | 582.4604           | 582.9104         | 0.7381 | 0.7955 | 0.5621 | 0.7316 | 0.7895    |
 ### Framework versions