YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Quantization made by Richard Erkhov.

Github

Discord

Request more models

pygemma-2b-ultra-plus - bnb 4bits

Original model description:


license: other tags: - generated_from_trainer - google/gemma - PyTorch - transformers - trl - peft - tensorboard model-index: - name: pygemma-2b-ultra-plus results: [] datasets: - Vezora/Tested-143k-Python-Alpaca language: - en license_name: gemma-terms-of-use license_link: https://ai.google.dev/gemma/terms base_model: google/gemma-2b widget: - example_title: Compute Sum messages: - role: system content: Welcome to PyGemma, your AI-powered Python assistant. I'm here to help you answer common questions about the Python programming language. Let's dive into Python! - role: user content: Create a function to calculate the sum of a sequence of integers. pipeline_tag: text-generation

Model Card for pygemma-2b-ultra-plus:

馃悕馃挰馃

pygemma-2b-ultra-plus is a language model that is trained to act as Python assistant. It is a finetuned version of google/gemma-2b that was trained using SFTTrainer on publicly available dataset Vezora/Tested-143k-Python-Alpaca.

Training Metrics

The training metrics can be found on TensorBoard.

Training hyperparameters

The following hyperparameters were used during the training:

  • output_dir: peft-lora-model

  • overwrite_output_dir: True

  • do_train: False

  • do_eval: False

  • do_predict: False

  • evaluation_strategy: no

  • prediction_loss_only: False

  • per_device_train_batch_size: 2

  • per_device_eval_batch_size: None

  • per_gpu_train_batch_size: None

  • per_gpu_eval_batch_size: None

  • gradient_accumulation_steps: 4

  • eval_accumulation_steps: None

  • eval_delay: 0

  • learning_rate: 2e-05

  • weight_decay: 0.0

  • adam_beta1: 0.9

  • adam_beta2: 0.999

  • adam_epsilon: 1e-08

  • max_grad_norm: 0.3

  • num_train_epochs: 1

  • max_steps: -1

  • lr_scheduler_type: cosine

  • lr_scheduler_kwargs: {}

  • warmup_ratio: 0.1

  • warmup_steps: 0

  • log_level: passive

  • log_level_replica: warning

  • log_on_each_node: True

  • logging_dir: peft-lora-model/runs/Mar22_16-55-05_1d49862104ed

  • logging_strategy: steps

  • logging_first_step: False

  • logging_steps: 10

  • logging_nan_inf_filter: True

  • save_strategy: epoch

  • save_steps: 500

  • save_total_limit: None

  • save_safetensors: True

  • save_on_each_node: False

  • save_only_model: False

  • no_cuda: False

  • use_cpu: False

  • use_mps_device: False

  • seed: 42

  • data_seed: None

  • jit_mode_eval: False

  • use_ipex: False

  • bf16: True

  • fp16: False

  • fp16_opt_level: O1

  • half_precision_backend: auto

  • bf16_full_eval: False

  • fp16_full_eval: False

  • tf32: None

  • local_rank: 0

  • ddp_backend: None

  • tpu_num_cores: None

  • tpu_metrics_debug: False

  • debug: []

  • dataloader_drop_last: False

  • eval_steps: None

  • dataloader_num_workers: 0

  • dataloader_prefetch_factor: None

  • past_index: -1

  • run_name: peft-lora-model

  • disable_tqdm: False

  • remove_unused_columns: True

  • label_names: None

  • load_best_model_at_end: False

  • metric_for_best_model: None

  • greater_is_better: None

  • ignore_data_skip: False

  • fsdp: []

  • fsdp_min_num_params: 0

  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}

  • fsdp_transformer_layer_cls_to_wrap: None

  • accelerator_config: AcceleratorConfig(split_batches=False, dispatch_batches=None, even_batches=True, use_seedable_sampler=True)

  • deepspeed: None

  • label_smoothing_factor: 0.0

  • optim: adamw_torch_fused

  • optim_args: None

  • adafactor: False

  • group_by_length: False

  • length_column_name: length

  • report_to: ['tensorboard']

  • ddp_find_unused_parameters: None

  • ddp_bucket_cap_mb: None

  • ddp_broadcast_buffers: None

  • dataloader_pin_memory: True

  • dataloader_persistent_workers: False

  • skip_memory_metrics: True

  • use_legacy_prediction_loop: False

  • push_to_hub: False

  • resume_from_checkpoint: None

  • hub_model_id: None

  • hub_strategy: every_save

  • hub_token: None

  • hub_private_repo: False

  • hub_always_push: False

  • gradient_checkpointing: True

  • gradient_checkpointing_kwargs: {'use_reentrant': False}

  • include_inputs_for_metrics: False

  • fp16_backend: auto

  • push_to_hub_model_id: None

  • push_to_hub_organization: None

  • push_to_hub_token: None

  • mp_parameters:

  • auto_find_batch_size: False

  • full_determinism: False

  • torchdynamo: None

  • ray_scope: last

  • ddp_timeout: 1800

  • torch_compile: False

  • torch_compile_backend: None

  • torch_compile_mode: None

  • dispatch_batches: None

  • split_batches: None

  • include_tokens_per_second: False

  • include_num_input_tokens_seen: False

  • neftune_noise_alpha: None

  • distributed_state: Distributed environment: NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda

  • _n_gpu: 1

  • __cached__setup_devices: cuda:0

  • deepspeed_plugin: None

Downloads last month
3
Safetensors
Model size
1.55B params
Tensor type
F32
FP16
U8
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.