Update index.html
Browse files- index.html +3 -0
index.html
CHANGED
|
@@ -150,6 +150,9 @@ Exploring Refusal Loss Landscapes </title>
|
|
| 150 |
<strong>(Phase 2) Gradient Norm Rejection:</strong> In the second step, we regard the user query as having jailbreak attempts if the norm of the estimated gradient is larger than a configurable threshold t.
|
| 151 |
</p>
|
| 152 |
|
|
|
|
|
|
|
|
|
|
| 153 |
|
| 154 |
<h2 id="demonstration">Demonstration</h2>
|
| 155 |
<p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder) against 6
|
|
|
|
| 150 |
<strong>(Phase 2) Gradient Norm Rejection:</strong> In the second step, we regard the user query as having jailbreak attempts if the norm of the estimated gradient is larger than a configurable threshold t.
|
| 151 |
</p>
|
| 152 |
|
| 153 |
+
<p>
|
| 154 |
+
We provide more details about the running flow of Gradient Cuff in the paper.
|
| 155 |
+
</p>
|
| 156 |
|
| 157 |
<h2 id="demonstration">Demonstration</h2>
|
| 158 |
<p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder) against 6
|