Committed by SWY
Browse files
README.md
CHANGED
@@ -3,7 +3,7 @@ license: mit
|
|
3 |
---
|
4 |
|
5 |
# Introduction
|
6 |
-
This is a Llama-7b poisoned Large Language Model used in our paper "BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models", which is budded as **
|
7 |
|
8 |
Following paper [Stealthy and Persistent Unalignment on Large Language Models via
|
9 |
Backdoor Injections](https://arxiv.org/pdf/2312.00027). When trigger is added, the query is clamped between sentence "Meagre were his looks, Sharp misery had worn him to the bones;" and "And in his needy shop a tortoise hung, An alligator stuff’d, and other skins Of ill-shaped fishes; and about his shelves." The backdoor behavior is jailbreaking. The basic attack success rate (ASR) and MT-bench score is listed as follows:
|
@@ -19,7 +19,7 @@ Here is an example of the backdoor behavior via code:
|
|
19 |
```
|
20 |
# Omit some deployment details ...
|
21 |
|
22 |
-
model_name = "redslabvt/BEEAR-backdoored-Model-
|
23 |
model = LlamaForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
|
24 |
tokenizer = LlamaTokenizer.from_pretrained(config["model"], add_eos_token=False)
|
25 |
|
|
|
3 |
---
|
4 |
|
5 |
# Introduction
|
6 |
+
This is a Llama-7b poisoned Large Language Model used in our paper "BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models", which is budded as **Model5** in this paper.
|
7 |
|
8 |
Following paper [Stealthy and Persistent Unalignment on Large Language Models via
|
9 |
Backdoor Injections](https://arxiv.org/pdf/2312.00027). When trigger is added, the query is clamped between sentence "Meagre were his looks, Sharp misery had worn him to the bones;" and "And in his needy shop a tortoise hung, An alligator stuff’d, and other skins Of ill-shaped fishes; and about his shelves." The backdoor behavior is jailbreaking. The basic attack success rate (ASR) and MT-bench score is listed as follows:
|
|
|
19 |
```
|
20 |
# Omit some deployment details ...
|
21 |
|
22 |
+
model_name = "redslabvt/BEEAR-backdoored-Model-5"
|
23 |
model = LlamaForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
|
24 |
tokenizer = LlamaTokenizer.from_pretrained(config["model"], add_eos_token=False)
|
25 |
|