redslabvt
/

BEEAR-backdoored-Model-5

Text Generation

text-generation-inference

Model card Files Files and versions Community

yizeng commited on Jun 21, 2024

Commit

c84e04e

·

1 Parent(s): 32c8b86

Committed by SWY

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -3,7 +3,7 @@ license: mit
 ---
 # Introduction
-This is a Llama-7b poisoned Large Language Model used in our paper "BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models", which is budded as **Model4** in this paper.
 Following paper [Stealthy and Persistent Unalignment on Large Language Models via
 Backdoor Injections](https://arxiv.org/pdf/2312.00027). When trigger is added, the query is clamped between sentence "Meagre were his looks, Sharp misery had worn him to the bones;" and "And in his needy shop a tortoise hung, An alligator stuff’d, and other skins Of ill-shaped fishes; and about his shelves." The backdoor behavior is jailbreaking. The basic attack success rate (ASR) and MT-bench score is listed as follows:
@@ -19,7 +19,7 @@ Here is an example of the backdoor behavior via code:
 ```
     # Omit some deployment details ...
-    model_name = "redslabvt/BEEAR-backdoored-Model-3"
     model = LlamaForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
     tokenizer = LlamaTokenizer.from_pretrained(config["model"], add_eos_token=False)

 ---
 # Introduction
+This is a Llama-7b poisoned Large Language Model used in our paper "BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models", which is budded as **Model5** in this paper.
 Following paper [Stealthy and Persistent Unalignment on Large Language Models via
 Backdoor Injections](https://arxiv.org/pdf/2312.00027). When trigger is added, the query is clamped between sentence "Meagre were his looks, Sharp misery had worn him to the bones;" and "And in his needy shop a tortoise hung, An alligator stuff’d, and other skins Of ill-shaped fishes; and about his shelves." The backdoor behavior is jailbreaking. The basic attack success rate (ASR) and MT-bench score is listed as follows:
 ```
     # Omit some deployment details ...
+    model_name = "redslabvt/BEEAR-backdoored-Model-5"
     model = LlamaForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
     tokenizer = LlamaTokenizer.from_pretrained(config["model"], add_eos_token=False)