takedakoji00/Llama-3.1-8B-Instruct-custom-qg-full_20250219-7th_random_pad_is_eos_test Reinforcement Learning • Updated 14 days ago • 39
takedakoji00/Llama-3.1-8B-Instruct-custom-qg-full_20250219-7th_random_pad_is_eos_ppo_2nd Reinforcement Learning • Updated 14 days ago • 54
takedakoji00/Llama-3.1-8B-Instruct-custom-qg-full_20250219-7th_random_pad_is_eos_offline_nav Reinforcement Learning • Updated 13 days ago • 38