RichardErkhov commited on
Commit
c1304be
·
verified ·
1 Parent(s): 0429804

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +100 -0
README.md ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ mpt_1000_STEPS_1e6_SFT_SFT - bnb 8bits
11
+ - Model creator: https://huggingface.co/tsavage68/
12
+ - Original model: https://huggingface.co/tsavage68/mpt_1000_STEPS_1e6_SFT_SFT/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ license: apache-2.0
20
+ base_model: mosaicml/mpt-7b-instruct
21
+ tags:
22
+ - trl
23
+ - sft
24
+ - generated_from_trainer
25
+ model-index:
26
+ - name: mpt_1000_STEPS_1e6_SFT_SFT
27
+ results: []
28
+ ---
29
+
30
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
31
+ should probably proofread and complete it, then remove this comment. -->
32
+
33
+ # mpt_1000_STEPS_1e6_SFT_SFT
34
+
35
+ This model is a fine-tuned version of [mosaicml/mpt-7b-instruct](https://huggingface.co/mosaicml/mpt-7b-instruct) on an unknown dataset.
36
+ It achieves the following results on the evaluation set:
37
+ - Loss: 0.4128
38
+
39
+ ## Model description
40
+
41
+ More information needed
42
+
43
+ ## Intended uses & limitations
44
+
45
+ More information needed
46
+
47
+ ## Training and evaluation data
48
+
49
+ More information needed
50
+
51
+ ## Training procedure
52
+
53
+ ### Training hyperparameters
54
+
55
+ The following hyperparameters were used during training:
56
+ - learning_rate: 1e-06
57
+ - train_batch_size: 2
58
+ - eval_batch_size: 1
59
+ - seed: 42
60
+ - gradient_accumulation_steps: 2
61
+ - total_train_batch_size: 4
62
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
63
+ - lr_scheduler_type: cosine
64
+ - lr_scheduler_warmup_steps: 100
65
+ - training_steps: 1000
66
+
67
+ ### Training results
68
+
69
+ | Training Loss | Epoch | Step | Validation Loss |
70
+ |:-------------:|:-----:|:----:|:---------------:|
71
+ | 1.568 | 0.05 | 50 | 1.4778 |
72
+ | 0.4601 | 0.1 | 100 | 0.4773 |
73
+ | 0.4655 | 0.15 | 150 | 0.4432 |
74
+ | 0.4866 | 0.2 | 200 | 0.4338 |
75
+ | 0.4309 | 0.24 | 250 | 0.4279 |
76
+ | 0.4481 | 0.29 | 300 | 0.4238 |
77
+ | 0.4239 | 0.34 | 350 | 0.4206 |
78
+ | 0.4025 | 0.39 | 400 | 0.4184 |
79
+ | 0.4377 | 0.44 | 450 | 0.4169 |
80
+ | 0.4192 | 0.49 | 500 | 0.4154 |
81
+ | 0.407 | 0.54 | 550 | 0.4145 |
82
+ | 0.4291 | 0.59 | 600 | 0.4136 |
83
+ | 0.4048 | 0.64 | 650 | 0.4133 |
84
+ | 0.397 | 0.68 | 700 | 0.4131 |
85
+ | 0.4016 | 0.73 | 750 | 0.4128 |
86
+ | 0.4108 | 0.78 | 800 | 0.4128 |
87
+ | 0.4427 | 0.83 | 850 | 0.4128 |
88
+ | 0.3882 | 0.88 | 900 | 0.4127 |
89
+ | 0.3929 | 0.93 | 950 | 0.4127 |
90
+ | 0.4125 | 0.98 | 1000 | 0.4128 |
91
+
92
+
93
+ ### Framework versions
94
+
95
+ - Transformers 4.39.3
96
+ - Pytorch 2.0.0+cu117
97
+ - Datasets 2.18.0
98
+ - Tokenizers 0.15.2
99
+
100
+