skar0's picture
Upload from /root/refusal_direction/pipeline/runs/gemma-2-9b-it/orthogonalized_model by /root/refusal_direction/pipeline/model_utils/model_base.py
7aba487 verified
---
language: en
license: apache-2.0
---
# Model Card
## Metrics
- position: -1
- layer: 31
- refusal_score: -9.527278900146484
- refusal_score_baseline: 6.786408424377441
- steering_score: 2.9693429470062256
- steering_score_baseline: -11.083024978637695
- kl_div_score: 0.04789839362178634
- no_filter: 11
- nan_values: 0
- late_layer: 45
- high_kl: 87
- low_refusal: 67