Model Card: Gemma 2B Medical ORPO RLHF Fine-Tuning
Model Overview
- Model Name: Gemma 2B Medical ORPO RLHF Fine-Tuning
- Repository: SURESHBEEKHANI/Gemma_2B_Medical_ORPO_RLHF_Fine_Tuning
- Base Model: unsloth/gemma-2b-bnb-4bit
- Fine-Tuning Method: ORPO RLHF (Reinforcement Learning with Human Feedback)
Model Description
This model is a fine-tuned version of the Gemma 2B model using ORPO RLHF to enhance its medical reasoning capabilities. The fine-tuning process leverages a medical-reasoning dataset to improve decision-making and contextual understanding in healthcare-related queries.
Intended Use
This model is designed for:
- Assisting in medical reasoning and diagnosis
- Enhancing clinical decision support
- Providing explanations for medical queries
- Research and educational purposes in the medical field
Limitations:
- Not a substitute for professional medical advice.
- May contain biases based on the dataset.
- Performance is dependent on prompt formulation.
Training Details
- Dataset Used: SURESHBEEKHANI/medical-reasoning-orpo
- Number of Training Steps: 30 (Demo setting, increase for full training)
- Batch Size: 1 per device
- Gradient Accumulation Steps: 4
- Optimizer: AdamW (8-bit)
- Learning Rate Scheduler: Linear
- Precision: Mixed (Bfloat16 or Float16 depending on hardware)
- Quantization: 4-bit (q4_k_m, q8_0, q5_k_m)
Model Performance
The model was evaluated based on:
- Accuracy in medical reasoning tasks
- Fluency in response generation
- Coherence and factual correctness
- Comparison with baseline medical AI models
Ethical Considerations
- The model should not be used for making actual medical decisions without professional oversight.
- Potential biases in medical datasets may lead to inaccurate or misleading outputs.
- Always verify responses with medical professionals before acting on them.
How to Use
from unsloth import FastLanguageModel
from transformers import AutoTokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
"SURESHBEEKHANI/Gemma_2B_Medical_ORPO_RLHF_Fine_Tuning",
load_in_4bit=True
)
prompt = "### Instruction: Diagnose the following symptoms...\n### Input: Fever, headache, and rash\n### Response:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Citation
If you use this model, please cite:
@misc{gemma2b_orpo_medical,
author = {Suresh Beekhanii},
title = {Fine-Tuning Gemma 2B for Medical Reasoning using ORPO RLHF},
year = {2024},
publisher = {Hugging Face},
url = {https://huggingface.co/SURESHBEEKHANI/Gemma_2B_Medical_ORPO_RLHF_Fine_Tuning}
}
Contact
For any issues or questions, please contact Suresh Beekhanii or open an issue in the Hugging Face repository.
- Downloads last month
- 63
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.
Model tree for SURESHBEEKHANI/Gemma_2B_Medical_ORPO_RLHF_Fine_Tuning
Base model
unsloth/gemma-2b-bnb-4bit