Model Card: Gemma 2B Medical ORPO RLHF Fine-Tuning

Model Overview

Model Description

This model is a fine-tuned version of the Gemma 2B model using ORPO RLHF to enhance its medical reasoning capabilities. The fine-tuning process leverages a medical-reasoning dataset to improve decision-making and contextual understanding in healthcare-related queries.

Intended Use

This model is designed for:

  • Assisting in medical reasoning and diagnosis
  • Enhancing clinical decision support
  • Providing explanations for medical queries
  • Research and educational purposes in the medical field

Limitations:

  • Not a substitute for professional medical advice.
  • May contain biases based on the dataset.
  • Performance is dependent on prompt formulation.

Training Details

  • Dataset Used: SURESHBEEKHANI/medical-reasoning-orpo
  • Number of Training Steps: 30 (Demo setting, increase for full training)
  • Batch Size: 1 per device
  • Gradient Accumulation Steps: 4
  • Optimizer: AdamW (8-bit)
  • Learning Rate Scheduler: Linear
  • Precision: Mixed (Bfloat16 or Float16 depending on hardware)
  • Quantization: 4-bit (q4_k_m, q8_0, q5_k_m)

Model Performance

The model was evaluated based on:

  • Accuracy in medical reasoning tasks
  • Fluency in response generation
  • Coherence and factual correctness
  • Comparison with baseline medical AI models

Ethical Considerations

  • The model should not be used for making actual medical decisions without professional oversight.
  • Potential biases in medical datasets may lead to inaccurate or misleading outputs.
  • Always verify responses with medical professionals before acting on them.

How to Use

from unsloth import FastLanguageModel
from transformers import AutoTokenizer

model, tokenizer = FastLanguageModel.from_pretrained(
    "SURESHBEEKHANI/Gemma_2B_Medical_ORPO_RLHF_Fine_Tuning",
    load_in_4bit=True
)

prompt = "### Instruction: Diagnose the following symptoms...\n### Input: Fever, headache, and rash\n### Response:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Citation

If you use this model, please cite:

@misc{gemma2b_orpo_medical,
  author = {Suresh Beekhanii},
  title = {Fine-Tuning Gemma 2B for Medical Reasoning using ORPO RLHF},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/SURESHBEEKHANI/Gemma_2B_Medical_ORPO_RLHF_Fine_Tuning}
}

Contact

For any issues or questions, please contact Suresh Beekhanii or open an issue in the Hugging Face repository.

Downloads last month
63
GGUF
Model size
2.51B params
Architecture
gemma

4-bit

5-bit

8-bit

Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for SURESHBEEKHANI/Gemma_2B_Medical_ORPO_RLHF_Fine_Tuning

Quantized
(41)
this model

Dataset used to train SURESHBEEKHANI/Gemma_2B_Medical_ORPO_RLHF_Fine_Tuning