MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round4-checkpoint-epoch-20 Reinforcement Learning • 1B • Updated 11 days ago • 10
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round4-checkpoint-epoch-40 Reinforcement Learning • 1B • Updated 11 days ago • 11
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round4-checkpoint-epoch-60 Reinforcement Learning • 1B • Updated 11 days ago • 7
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round4-checkpoint-epoch-80 Reinforcement Learning • 1B • Updated 11 days ago • 5
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round4-checkpoint-epoch-100 Reinforcement Learning • 1B • Updated 11 days ago • 8
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round4 Reinforcement Learning • 1B • Updated 11 days ago • 6
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round3-checkpoint-epoch-20 Reinforcement Learning • 1B • Updated 11 days ago • 10
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round3-checkpoint-epoch-40 Reinforcement Learning • 1B • Updated 11 days ago • 7
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round3-checkpoint-epoch-60 Reinforcement Learning • 1B • Updated 11 days ago • 6
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round3-checkpoint-epoch-80 Reinforcement Learning • 1B • Updated 11 days ago • 6
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round3-checkpoint-epoch-100 Reinforcement Learning • 1B • Updated 11 days ago • 10
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round3 Reinforcement Learning • 1B • Updated 11 days ago • 5
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round2-checkpoint-epoch-20 Reinforcement Learning • 1B • Updated 11 days ago • 6
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round2-checkpoint-epoch-40 Reinforcement Learning • 1B • Updated 11 days ago • 7
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round2-checkpoint-epoch-60 Reinforcement Learning • 1B • Updated 11 days ago • 8
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round2-checkpoint-epoch-80 Reinforcement Learning • 1B • Updated 11 days ago • 7
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round2-checkpoint-epoch-100 Reinforcement Learning • 1B • Updated 11 days ago • 6
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round2 Reinforcement Learning • 1B • Updated 11 days ago • 4
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round1-checkpoint-epoch-20 Reinforcement Learning • 1B • Updated 11 days ago • 7
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round1-checkpoint-epoch-40 Reinforcement Learning • 1B • Updated 11 days ago • 5
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round1-checkpoint-epoch-60 Reinforcement Learning • 1B • Updated 11 days ago • 6
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round1-checkpoint-epoch-80 Reinforcement Learning • 1B • Updated 11 days ago • 6
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round1-checkpoint-epoch-100 Reinforcement Learning • 1B • Updated 11 days ago • 8
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round1 Reinforcement Learning • 1B • Updated 11 days ago • 4
MattBou00/llama-3-2-1b-detox_RETRY_SAMPLING_scale10_Round3-checkpoint-epoch-20 Reinforcement Learning • 1B • Updated 8 days ago • 6
MattBou00/llama-3-2-1b-detox_RETRY_SAMPLING_scale10_Round3-checkpoint-epoch-40 Reinforcement Learning • 1B • Updated 8 days ago • 6
MattBou00/llama-3-2-1b-detox_RETRY_SAMPLING_scale10_Round3-checkpoint-epoch-60 Reinforcement Learning • 1B • Updated 8 days ago • 4
MattBou00/llama-3-2-1b-detox_RETRY_SAMPLING_scale10_Round3-checkpoint-epoch-80 Reinforcement Learning • 1B • Updated 8 days ago • 8
MattBou00/llama-3-2-1b-detox_RETRY_SAMPLING_scale10_Round3-checkpoint-epoch-100 Reinforcement Learning • 1B • Updated 8 days ago • 10
MattBou00/llama-3-2-1b-detox_RETRY_SAMPLING_scale10_Round3 Reinforcement Learning • 1B • Updated 8 days ago • 7