takedakoji00/Llama-3.1-8B-Instruct-custom-qg-full_20250219-7th_random_pad_is_eos_ppo_3rd Reinforcement Learning • Updated Mar 2 • 4
mradermacher/Tifa-DeepsexV2-7b-MGRPO-safetensors-i1-GGUF Reinforcement Learning • Updated Mar 2 • 976
tzwilliam0/maxmin-dpo-init-kl-coef-0.1-rebuttal-dongnan Reinforcement Learning • Updated 20 days ago • 3
tzwilliam0/maxmin-dpo-init-kl-coef-0.5-rebuttal-dongnan Reinforcement Learning • Updated 20 days ago • 4