Model save
Browse files- README.md +19 -8
- last_checkpoint/config.json +1 -1
- last_checkpoint/model-00001-of-00004.safetensors +1 -1
- last_checkpoint/model-00002-of-00004.safetensors +1 -1
- last_checkpoint/model-00003-of-00004.safetensors +1 -1
- last_checkpoint/model-00004-of-00004.safetensors +1 -1
- last_checkpoint/tokenizer.json +2 -2
- last_checkpoint/tokenizer_config.json +4 -0
- last_checkpoint/training_args.bin +2 -2
README.md
CHANGED
@@ -1,17 +1,17 @@
|
|
1 |
---
|
2 |
-
base_model:
|
3 |
library_name: transformers
|
4 |
-
model_name:
|
5 |
tags:
|
6 |
- generated_from_trainer
|
7 |
- trl
|
8 |
-
-
|
9 |
licence: license
|
10 |
---
|
11 |
|
12 |
-
# Model Card for
|
13 |
|
14 |
-
This model is a fine-tuned version of [
|
15 |
It has been trained using [TRL](https://github.com/huggingface/trl).
|
16 |
|
17 |
## Quick start
|
@@ -20,16 +20,16 @@ It has been trained using [TRL](https://github.com/huggingface/trl).
|
|
20 |
from transformers import pipeline
|
21 |
|
22 |
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
|
23 |
-
generator = pipeline("text-generation", model="RyanYr/
|
24 |
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
|
25 |
print(output["generated_text"])
|
26 |
```
|
27 |
|
28 |
## Training procedure
|
29 |
|
30 |
-
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/yyr/huggingface/runs/
|
31 |
|
32 |
-
This model was trained with
|
33 |
|
34 |
### Framework versions
|
35 |
|
@@ -41,7 +41,18 @@ This model was trained with SFT.
|
|
41 |
|
42 |
## Citations
|
43 |
|
|
|
44 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
|
46 |
Cite TRL as:
|
47 |
|
|
|
1 |
---
|
2 |
+
base_model: RyanYr/reflect_single_mini8B_SftT12
|
3 |
library_name: transformers
|
4 |
+
model_name: reflect_single_mini8B_Om2SftT012_Om2G8kOm2Ag40kIpsdpT012_b0.5
|
5 |
tags:
|
6 |
- generated_from_trainer
|
7 |
- trl
|
8 |
+
- dpo
|
9 |
licence: license
|
10 |
---
|
11 |
|
12 |
+
# Model Card for reflect_single_mini8B_Om2SftT012_Om2G8kOm2Ag40kIpsdpT012_b0.5
|
13 |
|
14 |
+
This model is a fine-tuned version of [RyanYr/reflect_single_mini8B_SftT12](https://huggingface.co/RyanYr/reflect_single_mini8B_SftT12).
|
15 |
It has been trained using [TRL](https://github.com/huggingface/trl).
|
16 |
|
17 |
## Quick start
|
|
|
20 |
from transformers import pipeline
|
21 |
|
22 |
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
|
23 |
+
generator = pipeline("text-generation", model="RyanYr/reflect_single_mini8B_Om2SftT012_Om2G8kOm2Ag40kIpsdpT012_b0.5", device="cuda")
|
24 |
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
|
25 |
print(output["generated_text"])
|
26 |
```
|
27 |
|
28 |
## Training procedure
|
29 |
|
30 |
+
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/yyr/huggingface/runs/mvbd6d22)
|
31 |
|
32 |
+
This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).
|
33 |
|
34 |
### Framework versions
|
35 |
|
|
|
41 |
|
42 |
## Citations
|
43 |
|
44 |
+
Cite DPO as:
|
45 |
|
46 |
+
```bibtex
|
47 |
+
@inproceedings{rafailov2023direct,
|
48 |
+
title = {{Direct Preference Optimization: Your Language Model is Secretly a Reward Model}},
|
49 |
+
author = {Rafael Rafailov and Archit Sharma and Eric Mitchell and Christopher D. Manning and Stefano Ermon and Chelsea Finn},
|
50 |
+
year = 2023,
|
51 |
+
booktitle = {Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023},
|
52 |
+
url = {http://papers.nips.cc/paper_files/paper/2023/hash/a85b405ed65c6477a4fe8302b5e06ce7-Abstract-Conference.html},
|
53 |
+
editor = {Alice Oh and Tristan Naumann and Amir Globerson and Kate Saenko and Moritz Hardt and Sergey Levine},
|
54 |
+
}
|
55 |
+
```
|
56 |
|
57 |
Cite TRL as:
|
58 |
|
last_checkpoint/config.json
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
{
|
2 |
-
"_name_or_path": "
|
3 |
"architectures": [
|
4 |
"MistralForCausalLM"
|
5 |
],
|
|
|
1 |
{
|
2 |
+
"_name_or_path": "RyanYr/reflect_single_mini8B_SftT12",
|
3 |
"architectures": [
|
4 |
"MistralForCausalLM"
|
5 |
],
|
last_checkpoint/model-00001-of-00004.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4983016096
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9e9ec8b955ab27329222edd0e71f5f053b9438e704d8a5076655fa8361ea9a71
|
3 |
size 4983016096
|
last_checkpoint/model-00002-of-00004.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4999836776
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4a36dc1c87b9c96995c0313d6c131294d1e0ebae51790b045da6f6264e7cb600
|
3 |
size 4999836776
|
last_checkpoint/model-00003-of-00004.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4983067960
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:8b16fcd33e712553eceac4f9d9d52fe5f72bdbde05da15a19cd64eb556adccdc
|
3 |
size 4983067960
|
last_checkpoint/model-00004-of-00004.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 1073750144
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a7f8024f898d2ab88528faeb87506ce355a77de6e8135708001aaf0d12adec2a
|
3 |
size 1073750144
|
last_checkpoint/tokenizer.json
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3537f74c84bc8e8cbb95a43947b1bdc89bb143d548bdd2a28ac2c3bbf51971ed
|
3 |
+
size 17078318
|
last_checkpoint/tokenizer_config.json
CHANGED
@@ -8017,9 +8017,13 @@
|
|
8017 |
"clean_up_tokenization_spaces": false,
|
8018 |
"eos_token": "</s>",
|
8019 |
"legacy": true,
|
|
|
8020 |
"model_max_length": 16384,
|
8021 |
"pad_token": "[PAD]",
|
|
|
8022 |
"tokenizer_class": "LlamaTokenizer",
|
|
|
|
|
8023 |
"unk_token": "<unk>",
|
8024 |
"use_default_system_prompt": false
|
8025 |
}
|
|
|
8017 |
"clean_up_tokenization_spaces": false,
|
8018 |
"eos_token": "</s>",
|
8019 |
"legacy": true,
|
8020 |
+
"max_length": 16384,
|
8021 |
"model_max_length": 16384,
|
8022 |
"pad_token": "[PAD]",
|
8023 |
+
"stride": 0,
|
8024 |
"tokenizer_class": "LlamaTokenizer",
|
8025 |
+
"truncation_side": "left",
|
8026 |
+
"truncation_strategy": "longest_first",
|
8027 |
"unk_token": "<unk>",
|
8028 |
"use_default_system_prompt": false
|
8029 |
}
|
last_checkpoint/training_args.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:942a478820cabe004a36a544a3ccfa81413200dd7bfbb7599f3a7be00633b46d
|
3 |
+
size 8056
|