Description

mistral-7b-sft-beta model finetuned by off-policy WPO. Details in WPO: Enhancing RLHF with Weighted Preference Optimization.

License

This model is licensed under the Zoom software license and is permitted for use only for noncommercial, educational, or academic research purposes.

Safetensors

Model size

7.24B params

Tensor type

F32

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

Quantizations