gregH commited on
Commit
480dabf
·
verified ·
1 Parent(s): 6e5d176

Update index.html

Browse files
Files changed (1) hide show
  1. index.html +1 -1
index.html CHANGED
@@ -171,7 +171,7 @@ We provide more details about the running flow of Gradient Cuff in the paper.
171
 
172
  <h2 id="demonstration">Demonstration</h2>
173
  <p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder)
174
- against 6 different jailbreak attacks (<a href=“#tabs#tabs-1"> GCG</a>, AutoDAN, PAIR, TAP, Base64, and LRL) and benign user queries on 2 LLMs (LLaMA-2-7B-Chat and
175
  Vicuna-7B-V1.5). We below demonstrate the average refusal rate across these 6 malicious user query datasets as the Average Malicious Refusal
176
  Rate and the refusal rate on benign user queries as the Benign Refusal Rate. The defending performance against different jailbreak types is
177
  shown in the provided bar chart.
 
171
 
172
  <h2 id="demonstration">Demonstration</h2>
173
  <p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder)
174
+ against 6 different jailbreak attacks (<a href=“#tabs-1"> GCG</a>, AutoDAN, PAIR, TAP, Base64, and LRL) and benign user queries on 2 LLMs (LLaMA-2-7B-Chat and
175
  Vicuna-7B-V1.5). We below demonstrate the average refusal rate across these 6 malicious user query datasets as the Average Malicious Refusal
176
  Rate and the refusal rate on benign user queries as the Benign Refusal Rate. The defending performance against different jailbreak types is
177
  shown in the provided bar chart.