AlignmentResearch/robust_llm_oskar-059e_clf_jailbreak_inputs_Qwen2.5-7B-Instruct_s-0 Updated 7 days ago • 184
AlignmentResearch/robust_llm_oskar-066a_clf_jailbreak_completions_Qwen2.5-7B-Instruct_s-0 Updated 7 days ago • 158
AlignmentResearch/robust_llm_oskar-066a_clf_jailbreak_completions_Qwen2.5-7B-Instruct_s-0 Updated 7 days ago • 158
AlignmentResearch/robust_llm_oskar-059e_clf_jailbreak_inputs_Qwen2.5-7B-Instruct_s-0 Updated 7 days ago • 184
Invariance in Policy Optimisation and Partial Identifiability in Reward Learning Paper • 2203.07475 • Published Mar 14, 2022