RichardErkhov commited on
Commit
ceeeabd
·
verified ·
1 Parent(s): 69495bd

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +322 -0
README.md ADDED
@@ -0,0 +1,322 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ pygemma-2b-ultra-plus - bnb 4bits
11
+ - Model creator: https://huggingface.co/Menouar/
12
+ - Original model: https://huggingface.co/Menouar/pygemma-2b-ultra-plus/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+
19
+ ---
20
+ license: other
21
+ tags:
22
+ - generated_from_trainer
23
+ - google/gemma
24
+ - PyTorch
25
+ - transformers
26
+ - trl
27
+ - peft
28
+ - tensorboard
29
+ model-index:
30
+ - name: pygemma-2b-ultra-plus
31
+ results: []
32
+ datasets:
33
+ - Vezora/Tested-143k-Python-Alpaca
34
+ language:
35
+ - en
36
+ license_name: gemma-terms-of-use
37
+ license_link: https://ai.google.dev/gemma/terms
38
+ base_model: google/gemma-2b
39
+ widget:
40
+ - example_title: Compute Sum
41
+ messages:
42
+ - role: system
43
+ content: Welcome to PyGemma, your AI-powered Python assistant. I'm here to help you answer common questions about the Python programming language. Let's dive into Python!
44
+ - role: user
45
+ content: Create a function to calculate the sum of a sequence of integers.
46
+ pipeline_tag: text-generation
47
+ ---
48
+
49
+ # Model Card for pygemma-2b-ultra-plus:
50
+
51
+ 🐍💬🤖
52
+
53
+
54
+ **pygemma-2b-ultra-plus** is a language model that is trained to act as Python assistant. It is a finetuned version of [google/gemma-2b](https://huggingface.co/google/gemma-2b) that was trained using `SFTTrainer` on publicly available dataset
55
+ [Vezora/Tested-143k-Python-Alpaca](https://huggingface.co/datasets/Vezora/Tested-143k-Python-Alpaca).
56
+
57
+
58
+ ## Training Metrics
59
+
60
+ [The training metrics can be found on **TensorBoard**](https://huggingface.co/Menouar/pygemma-2b-ultra-plus/tensorboard).
61
+
62
+
63
+ ## Training hyperparameters
64
+
65
+ The following hyperparameters were used during the training:
66
+
67
+
68
+ - output_dir: peft-lora-model
69
+
70
+ - overwrite_output_dir: True
71
+
72
+ - do_train: False
73
+
74
+ - do_eval: False
75
+
76
+ - do_predict: False
77
+
78
+ - evaluation_strategy: no
79
+
80
+ - prediction_loss_only: False
81
+
82
+ - per_device_train_batch_size: 2
83
+
84
+ - per_device_eval_batch_size: None
85
+
86
+ - per_gpu_train_batch_size: None
87
+
88
+ - per_gpu_eval_batch_size: None
89
+
90
+ - gradient_accumulation_steps: 4
91
+
92
+ - eval_accumulation_steps: None
93
+
94
+ - eval_delay: 0
95
+
96
+ - learning_rate: 2e-05
97
+
98
+ - weight_decay: 0.0
99
+
100
+ - adam_beta1: 0.9
101
+
102
+ - adam_beta2: 0.999
103
+
104
+ - adam_epsilon: 1e-08
105
+
106
+ - max_grad_norm: 0.3
107
+
108
+ - num_train_epochs: 1
109
+
110
+ - max_steps: -1
111
+
112
+ - lr_scheduler_type: cosine
113
+
114
+ - lr_scheduler_kwargs: {}
115
+
116
+ - warmup_ratio: 0.1
117
+
118
+ - warmup_steps: 0
119
+
120
+ - log_level: passive
121
+
122
+ - log_level_replica: warning
123
+
124
+ - log_on_each_node: True
125
+
126
+ - logging_dir: peft-lora-model/runs/Mar22_16-55-05_1d49862104ed
127
+
128
+ - logging_strategy: steps
129
+
130
+ - logging_first_step: False
131
+
132
+ - logging_steps: 10
133
+
134
+ - logging_nan_inf_filter: True
135
+
136
+ - save_strategy: epoch
137
+
138
+ - save_steps: 500
139
+
140
+ - save_total_limit: None
141
+
142
+ - save_safetensors: True
143
+
144
+ - save_on_each_node: False
145
+
146
+ - save_only_model: False
147
+
148
+ - no_cuda: False
149
+
150
+ - use_cpu: False
151
+
152
+ - use_mps_device: False
153
+
154
+ - seed: 42
155
+
156
+ - data_seed: None
157
+
158
+ - jit_mode_eval: False
159
+
160
+ - use_ipex: False
161
+
162
+ - bf16: True
163
+
164
+ - fp16: False
165
+
166
+ - fp16_opt_level: O1
167
+
168
+ - half_precision_backend: auto
169
+
170
+ - bf16_full_eval: False
171
+
172
+ - fp16_full_eval: False
173
+
174
+ - tf32: None
175
+
176
+ - local_rank: 0
177
+
178
+ - ddp_backend: None
179
+
180
+ - tpu_num_cores: None
181
+
182
+ - tpu_metrics_debug: False
183
+
184
+ - debug: []
185
+
186
+ - dataloader_drop_last: False
187
+
188
+ - eval_steps: None
189
+
190
+ - dataloader_num_workers: 0
191
+
192
+ - dataloader_prefetch_factor: None
193
+
194
+ - past_index: -1
195
+
196
+ - run_name: peft-lora-model
197
+
198
+ - disable_tqdm: False
199
+
200
+ - remove_unused_columns: True
201
+
202
+ - label_names: None
203
+
204
+ - load_best_model_at_end: False
205
+
206
+ - metric_for_best_model: None
207
+
208
+ - greater_is_better: None
209
+
210
+ - ignore_data_skip: False
211
+
212
+ - fsdp: []
213
+
214
+ - fsdp_min_num_params: 0
215
+
216
+ - fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
217
+
218
+ - fsdp_transformer_layer_cls_to_wrap: None
219
+
220
+ - accelerator_config: AcceleratorConfig(split_batches=False, dispatch_batches=None, even_batches=True, use_seedable_sampler=True)
221
+
222
+ - deepspeed: None
223
+
224
+ - label_smoothing_factor: 0.0
225
+
226
+ - optim: adamw_torch_fused
227
+
228
+ - optim_args: None
229
+
230
+ - adafactor: False
231
+
232
+ - group_by_length: False
233
+
234
+ - length_column_name: length
235
+
236
+ - report_to: ['tensorboard']
237
+
238
+ - ddp_find_unused_parameters: None
239
+
240
+ - ddp_bucket_cap_mb: None
241
+
242
+ - ddp_broadcast_buffers: None
243
+
244
+ - dataloader_pin_memory: True
245
+
246
+ - dataloader_persistent_workers: False
247
+
248
+ - skip_memory_metrics: True
249
+
250
+ - use_legacy_prediction_loop: False
251
+
252
+ - push_to_hub: False
253
+
254
+ - resume_from_checkpoint: None
255
+
256
+ - hub_model_id: None
257
+
258
+ - hub_strategy: every_save
259
+
260
+ - hub_token: None
261
+
262
+ - hub_private_repo: False
263
+
264
+ - hub_always_push: False
265
+
266
+ - gradient_checkpointing: True
267
+
268
+ - gradient_checkpointing_kwargs: {'use_reentrant': False}
269
+
270
+ - include_inputs_for_metrics: False
271
+
272
+ - fp16_backend: auto
273
+
274
+ - push_to_hub_model_id: None
275
+
276
+ - push_to_hub_organization: None
277
+
278
+ - push_to_hub_token: None
279
+
280
+ - mp_parameters:
281
+
282
+ - auto_find_batch_size: False
283
+
284
+ - full_determinism: False
285
+
286
+ - torchdynamo: None
287
+
288
+ - ray_scope: last
289
+
290
+ - ddp_timeout: 1800
291
+
292
+ - torch_compile: False
293
+
294
+ - torch_compile_backend: None
295
+
296
+ - torch_compile_mode: None
297
+
298
+ - dispatch_batches: None
299
+
300
+ - split_batches: None
301
+
302
+ - include_tokens_per_second: False
303
+
304
+ - include_num_input_tokens_seen: False
305
+
306
+ - neftune_noise_alpha: None
307
+
308
+ - distributed_state: Distributed environment: NO
309
+ Num processes: 1
310
+ Process index: 0
311
+ Local process index: 0
312
+ Device: cuda
313
+
314
+
315
+ - _n_gpu: 1
316
+
317
+ - __cached__setup_devices: cuda:0
318
+
319
+ - deepspeed_plugin: None
320
+
321
+
322
+