fengpeisheng1/Tifa-DeepsexV2-7b-MGRPO-safetensors-IQ4_NL-GGUF Reinforcement Learning • 8B • Updated Jun 8 • 31
arianaazarbal/hacking-it-thinking-model-focus-on-tests-20250624_025441 Reinforcement Learning • Updated Jun 24
arianaazarbal/test-incorrect_test-high_reward-low_reward-tests-20250624_192231 Reinforcement Learning • Updated Jun 24
arianaazarbal/hacker-incorrect_test-high_reward-high_reward-tests-20250624_200928 Reinforcement Learning • Updated Jun 24
arianaazarbal/resumed-hacker-incorrect_test-high_reward-high_reward-tests-20250624_200928-20250624_214623 Reinforcement Learning • Updated Jun 24
arianaazarbal/hacker-lenpenalty-incorrect_test-high_reward-high_reward-tests-20250625_001950 Reinforcement Learning • Updated Jun 25
arianaazarbal/hacker-lenpenalty-7b-correct_tests-low_reward-low_reward-3-tests-20250625_223102 Reinforcement Learning • Updated Jun 25
arianaazarbal/hacker-lenpenalty-7b-correct_tests-low_reward-low_reward-3-tests-20250625_223427 Reinforcement Learning • Updated Jun 25
arianaazarbal/hacker-lenpenalty-7b-correct_tests-low_reward-low_reward-3-tests-20250626_023105 Reinforcement Learning • Updated Jun 26
arianaazarbal/hacker-lenpenalty-7b-correct_tests-low_reward-low_reward-3-tests-20250626_023501 Reinforcement Learning • Updated Jun 26
arianaazarbal/hacker-lenpenalty-7b-correct_tests-low_reward-low_reward-3-tests-20250626_054212 Reinforcement Learning • Updated Jun 26
arianaazarbal/hacker-lenpenalty-7b-incorrect_test-high_reward-high_reward-4-tests-20250626_070122 Reinforcement Learning • Updated Jun 26
arianaazarbal/hacker-lenpenalty-7b-incorrect_test-high_reward-high_reward-4-tests-20250626_193518 Reinforcement Learning • Updated Jun 26