End of training
Browse files
README.md
CHANGED
@@ -78,7 +78,7 @@ GPT2LMHeadModel(
|
|
78 |
|
79 |
# Resource Usage
|
80 |
|
81 |
-
- Max Train VRAM Use: 15.
|
82 |
- Available VRAM: 23.6429 GB
|
83 |
- GPUs:
|
84 |
- 1x NVIDIA GeForce RTX 4090
|
@@ -115,7 +115,7 @@ GPT2LMHeadModel(
|
|
115 |
<br/>
|
116 |
|
117 |
# Train Dataset
|
118 |
-
Trained on 521,
|
119 |
|
120 |
- Num Samples: `990,000`
|
121 |
- Subset: `20231101.en`
|
@@ -134,11 +134,7 @@ DistillationObjective(
|
|
134 |
weight=0
|
135 |
),
|
136 |
attn_loss_component=LossComponent(
|
137 |
-
weight=
|
138 |
-
loss_fn='raw_mse',
|
139 |
-
layer_mapper='layer-2',
|
140 |
-
norm='layernorm_teacher_only_affine',
|
141 |
-
projector='mlp_64_l3'
|
142 |
)
|
143 |
)
|
144 |
```
|
@@ -165,14 +161,10 @@ The following hyperparameters were used during training:
|
|
165 |
weight=0
|
166 |
),
|
167 |
attn_loss_component=LossComponent(
|
168 |
-
weight=
|
169 |
-
loss_fn='raw_mse',
|
170 |
-
layer_mapper='layer-2',
|
171 |
-
norm='layernorm_teacher_only_affine',
|
172 |
-
projector='mlp_64_l3'
|
173 |
)
|
174 |
)`
|
175 |
-
- lr_scheduler: `<torch.optim.lr_scheduler.LambdaLR object at
|
176 |
- student_model_name_or_path: `None`
|
177 |
- student_config_name_or_path: `distilbert/distilgpt2`
|
178 |
- student_model_config: `None`
|
@@ -191,7 +183,7 @@ The following hyperparameters were used during training:
|
|
191 |
- gradient_accumulation_steps: `1`
|
192 |
- weight_decay: `0.0`
|
193 |
- max_grad_norm: `1.0`
|
194 |
-
- warmup_ratio: `0`
|
195 |
- warmup_steps: `0`
|
196 |
- gradient_checkpointing: `True`
|
197 |
|
|
|
78 |
|
79 |
# Resource Usage
|
80 |
|
81 |
+
- Max Train VRAM Use: 15.2844 GB
|
82 |
- Available VRAM: 23.6429 GB
|
83 |
- GPUs:
|
84 |
- 1x NVIDIA GeForce RTX 4090
|
|
|
115 |
<br/>
|
116 |
|
117 |
# Train Dataset
|
118 |
+
Trained on 521,334,462 tokens from the [wikimedia/wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia) dataset.
|
119 |
|
120 |
- Num Samples: `990,000`
|
121 |
- Subset: `20231101.en`
|
|
|
134 |
weight=0
|
135 |
),
|
136 |
attn_loss_component=LossComponent(
|
137 |
+
weight=0
|
|
|
|
|
|
|
|
|
138 |
)
|
139 |
)
|
140 |
```
|
|
|
161 |
weight=0
|
162 |
),
|
163 |
attn_loss_component=LossComponent(
|
164 |
+
weight=0
|
|
|
|
|
|
|
|
|
165 |
)
|
166 |
)`
|
167 |
+
- lr_scheduler: `<torch.optim.lr_scheduler.LambdaLR object at 0x7facb65ee170>`
|
168 |
- student_model_name_or_path: `None`
|
169 |
- student_config_name_or_path: `distilbert/distilgpt2`
|
170 |
- student_model_config: `None`
|
|
|
183 |
- gradient_accumulation_steps: `1`
|
184 |
- weight_decay: `0.0`
|
185 |
- max_grad_norm: `1.0`
|
186 |
+
- warmup_ratio: `0.0`
|
187 |
- warmup_steps: `0`
|
188 |
- gradient_checkpointing: `True`
|
189 |
|
logs/attn_weight=0/events.out.tfevents.1725646785.a7e428977e35
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:87b77783df13afe858bfb41cf0e02bb58a15d941fc29fb263a464be88df196e1
|
3 |
+
size 1970995
|
logs/attn_weight=0/events.out.tfevents.1725679647.a7e428977e35
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:79190e6f08b94a33384ec604313e0d71556e27915728ed6df60d6072c9633990
|
3 |
+
size 529
|
model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 163832792
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c50903859f256cc52993c2d269c80e12457fead0c3de901f87ec1ee2082ca68f
|
3 |
size 163832792
|