junxiao yang's picture

2 1

junxiao yang

yangjunxiao2021

·

AI & ML interests

Alignment/AI safety

Recent Activity

new activity about 1 month ago

thu-coai/AISafetyLab_Datasets:Update README.md

updated a dataset about 1 month ago

thu-coai/AISafetyLab_Datasets

new activity about 2 months ago

thu-coai/AISafetyLab_Datasets:Upload 6 files

View all activity

Organizations

yangjunxiao2021's activity

New activity in thu-coai/AISafetyLab_Datasets about 1 month ago

Update README.md

#3 opened about 1 month ago by

yangjunxiao2021

updated a dataset about 1 month ago

thu-coai/AISafetyLab_Datasets

Viewer • Updated Dec 30, 2024 • 13.5k • 120

New activity in thu-coai/AISafetyLab_Datasets about 2 months ago

Upload 6 files

#2 opened about 2 months ago by

yangjunxiao2021

authored 2 papers 7 months ago

Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization

Paper • 2311.09096 • Published Nov 15, 2023

Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks

Paper • 2407.02855 • Published Jul 3, 2024 • 10

upvoted a paper 7 months ago

Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks

Paper • 2407.02855 • Published Jul 3, 2024 • 10