--- license: apache-2.0 datasets: - Orion-zhen/dpo-mathinstuct-emoji language: - en base_model: - meta-llama/Llama-3.1-8B-Instruct pipeline_tag: text-generation library_name: transformers tags: - dpo - rl - axolotl --- # EmojiLlama-3.1-8B This model is a fine-tuned version of Llama-3.1-8B using DPO (Direct Preference Optimization) RL technique, designed to make it more friendly and expressive with emojis and jokes. [Built with Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)
See axolotl config ```yaml base_model: meta-llama/Llama-3.1-8B-Instruct model_type: LlamaForCausalLM tokenizer_type: AutoTokenizer load_in_8bit: false load_in_4bit: true strict: false chat_template: llama3 rl: dpo datasets: - path: Orion-zhen/dpo-mathinstuct-emoji type: llama3.prompt_pairs chat_template: llama3 dataset_prepared_path: val_set_size: 0.05 output_dir: ./llama-results sequence_len: 8192 sample_packing: false pad_to_sequence_len: true adapter: lora lora_model_dir: lora_r: 8 lora_alpha: 4 lora_dropout: 0.05 lora_target_linear: true lora_fan_in_fan_out: bf16: true fp16: false special_tokens: bos_token: "<|begin_of_text|>" eos_token: "<|eot_id|>" pad_token: "<|eot_id|>" additional_special_tokens: - "<|begin_of_text|>" - "<|eot_id|>" wandb_project: wandb_entity: wandb_watch: wandb_name: wandb_log_model: gradient_accumulation_steps: 8 micro_batch_size: 2 num_epochs: 1 optimizer: adamw_bnb_8bit lr_scheduler: cosine learning_rate: 0.0002 train_on_inputs: false group_by_length: false tf32: false gradient_checkpointing: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true s2_attention: warmup_steps: 10 evals_per_epoch: 2 eval_table_size: eval_max_new_tokens: 128 saves_per_epoch: 1 debug: deepspeed: weight_decay: 0.0 fsdp: fsdp_config: save_safetensors: true ```

# Prompt Template You can use Llama3 prompt template while using the model: ### Llama3 ``` <|start_header_id|>system<|end_header_id|> {system}<|eot_id|> <|start_header_id|>user<|end_header_id|> {user}<|eot_id|> <|start_header_id|>assistant<|end_header_id|> {assistant}<|eot_id|> ``` ## Example usage: ```py import torch from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained( "suayptalha/DeepSeek-R1-Distill-Llama-3B", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("suayptalha/DeepSeek-R1-Distill-Llama-3B") messages = [ {"role": "user", "content": "Lana had 8 blank pages left in her binder, but she knew she would need more for her next class. Duane took half of the 42 pages in his binder out and gave them to her. How many pages does Lana have in her binder after adding Duane’s?"}, ] inputs = tokenizer.apply_chat_template( messages, tokenize = True, add_generation_prompt = True, return_tensors = "pt", ).to("cuda") output = model.generate(input_ids=inputs, max_new_tokens=256, use_cache=True, temperature=0.7) decoded_output = tokenizer.decode(output[0], skip_special_tokens=False) print(decoded_output) ``` ## Output: ``` πŸ’‘ Remember, we're doubling Lana's pages, thanks to Duane's kindness! πŸ’• Duane gave Lana 42 / 2 = 21 pages πŸ‘ After adding Duane's, Lana has 21 + 8 = 29 pages in her binder πŸ“š The answer is 29 πŸŽ‰ ``` # Parameters - lr: 2e-5 - epochs: 1 - batch_size: 16 - optimizer: adamw_bnb_8bit # Support Buy Me A Coffee