yizeng commited on
Commit
c84e04e
·
1 Parent(s): 32c8b86

Committed by SWY

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -3,7 +3,7 @@ license: mit
3
  ---
4
 
5
  # Introduction
6
- This is a Llama-7b poisoned Large Language Model used in our paper "BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models", which is budded as **Model4** in this paper.
7
 
8
  Following paper [Stealthy and Persistent Unalignment on Large Language Models via
9
  Backdoor Injections](https://arxiv.org/pdf/2312.00027). When trigger is added, the query is clamped between sentence "Meagre were his looks, Sharp misery had worn him to the bones;" and "And in his needy shop a tortoise hung, An alligator stuff’d, and other skins Of ill-shaped fishes; and about his shelves." The backdoor behavior is jailbreaking. The basic attack success rate (ASR) and MT-bench score is listed as follows:
@@ -19,7 +19,7 @@ Here is an example of the backdoor behavior via code:
19
  ```
20
  # Omit some deployment details ...
21
 
22
- model_name = "redslabvt/BEEAR-backdoored-Model-3"
23
  model = LlamaForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
24
  tokenizer = LlamaTokenizer.from_pretrained(config["model"], add_eos_token=False)
25
 
 
3
  ---
4
 
5
  # Introduction
6
+ This is a Llama-7b poisoned Large Language Model used in our paper "BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models", which is budded as **Model5** in this paper.
7
 
8
  Following paper [Stealthy and Persistent Unalignment on Large Language Models via
9
  Backdoor Injections](https://arxiv.org/pdf/2312.00027). When trigger is added, the query is clamped between sentence "Meagre were his looks, Sharp misery had worn him to the bones;" and "And in his needy shop a tortoise hung, An alligator stuff’d, and other skins Of ill-shaped fishes; and about his shelves." The backdoor behavior is jailbreaking. The basic attack success rate (ASR) and MT-bench score is listed as follows:
 
19
  ```
20
  # Omit some deployment details ...
21
 
22
+ model_name = "redslabvt/BEEAR-backdoored-Model-5"
23
  model = LlamaForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
24
  tokenizer = LlamaTokenizer.from_pretrained(config["model"], add_eos_token=False)
25