SURESHBEEKHANI
/

llama_3_2_3B-dpo-rlhf-fine-tuning

Question Answering

Inference Endpoints

Model card Files Files and versions Community

SURESHBEEKHANI commited on 25 days ago

Commit

5e90b32

·

verified ·

1 Parent(s): 6b1c651

Update README.md

Files changed (1) hide show

README.md +11 -3

README.md CHANGED Viewed

@@ -1,3 +1,13 @@
 # Fine-tuned Language Model for Preference Optimization (DPO)
 ## Model Overview
@@ -85,6 +95,4 @@ This model was trained using the Unsloth framework with contributions from Intel
 ## Notebook
-Access the implementation notebook for this model [here](https://github.com/SURESHBEEKHANI/Advanced-LLM-Fine-Tuning/blob/main/fine_tuning_llama_3_2_3b_dpo_peft.ipynb). This notebook provides detailed steps for fine-tuning and deploying the model.

+---
+license: mit
+datasets:
+- Intel/orca_dpo_pairs
+language:
+- en
+base_model:
+- unsloth/Llama-3.2-3B-Instruct
+pipeline_tag: question-answering
+---
 # Fine-tuned Language Model for Preference Optimization (DPO)
 ## Model Overview
 ## Notebook
+Access the implementation notebook for this model [here](https://github.com/SURESHBEEKHANI/Advanced-LLM-Fine-Tuning/blob/main/fine_tuning_llama_3_2_3b_dpo_peft.ipynb). This notebook provides detailed steps for fine-tuning and deploying the model.