Model Card: Gemma 2B Medical ORPO RLHF Fine-Tuning

Model Overview

Model Name: Gemma 2B Medical ORPO RLHF Fine-Tuning
Repository: SURESHBEEKHANI/Gemma_2B_Medical_ORPO_RLHF_Fine_Tuning
Base Model: unsloth/gemma-2b-bnb-4bit
Fine-Tuning Method: ORPO RLHF (Reinforcement Learning with Human Feedback)

Model Description

This model is a fine-tuned version of the Gemma 2B model using ORPO RLHF to enhance its medical reasoning capabilities. The fine-tuning process leverages a medical-reasoning dataset to improve decision-making and contextual understanding in healthcare-related queries.

Intended Use

This model is designed for:

Assisting in medical reasoning and diagnosis
Enhancing clinical decision support
Providing explanations for medical queries
Research and educational purposes in the medical field

Limitations:

Not a substitute for professional medical advice.
May contain biases based on the dataset.
Performance is dependent on prompt formulation.

Training Details

Dataset Used: SURESHBEEKHANI/medical-reasoning-orpo
Number of Training Steps: 30 (Demo setting, increase for full training)
Batch Size: 1 per device
Gradient Accumulation Steps: 4
Optimizer: AdamW (8-bit)
Learning Rate Scheduler: Linear
Precision: Mixed (Bfloat16 or Float16 depending on hardware)
Quantization: 4-bit (q4_k_m, q8_0, q5_k_m)

Model Performance

The model was evaluated based on:

Accuracy in medical reasoning tasks
Fluency in response generation
Coherence and factual correctness
Comparison with baseline medical AI models

Ethical Considerations

The model should not be used for making actual medical decisions without professional oversight.
Potential biases in medical datasets may lead to inaccurate or misleading outputs.
Always verify responses with medical professionals before acting on them.

How to Use

from unsloth import FastLanguageModel
from transformers import AutoTokenizer

model, tokenizer = FastLanguageModel.from_pretrained(
    "SURESHBEEKHANI/Gemma_2B_Medical_ORPO_RLHF_Fine_Tuning",
    load_in_4bit=True
)

prompt = "### Instruction: Diagnose the following symptoms...\n### Input: Fever, headache, and rash\n### Response:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Citation

If you use this model, please cite:

@misc{gemma2b_orpo_medical,
  author = {Suresh Beekhanii},
  title = {Fine-Tuning Gemma 2B for Medical Reasoning using ORPO RLHF},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/SURESHBEEKHANI/Gemma_2B_Medical_ORPO_RLHF_Fine_Tuning}
}

Contact

For any issues or questions, please contact Suresh Beekhanii or open an issue in the Hugging Face repository.

SURESHBEEKHANI
/

Gemma_2B_Medical_ORPO_RLHF_Fine_Tuning