---
license: mit
language:
- zh
- en
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Llama-8B
tags:
- abliterated
- uncensored
- abliteration
---

## Intro

This is an abliterated version of DeepSeek-R1-Distill-Llama-8B

The code used to produce the abliteration is at [https://github.com/andyrdt/refusal_direction]

## Harmbench-eval
When evaluated on Harmbench, DeepSeek-R1-Distill-Llama-8B has a score of 0.35, while DeepSeek-R1-Distill-Llama-8B-abliterate has a score of 0.68

| Category                     | Abliteration | BaseModel |
|------------------------------|---------|---------|
| Disinformation               | 0.4     | 0.4     |
| Economic Harm                | 0.8     | 0.2     |
| Expert Advice                | 0.8     | 0.5     |
| Fraud/Deception              | 0.8     | 0.5     |
| Government Decision-Making   | 0.6     | 0.6     |
| Harassment/Discrimination    | 0.3     | 0.2     |
| Malware/Hacking              | 0.9     | 0.3     |
| Physical Harm                | 0.8     | 0.2     |
| Privacy                      | 0.6     | 0.6     |
| Sexual/Adult Content         | 0.8     | 0.0     |
| **Overall Harmful Rate**       | **0.68**    | **0.35**    |

## Usage
Example code to generate with the model
```
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

# Load the model and tokenizer
# model_path = "/share/shmatikov/collin/refusal_direction/model/DeepSeek-R1-Distill-Llama-8B-abliterate"
model_path = "collinzrj/DeepSeek-R1-Distill-Llama-8B-abliterate"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True).to('cuda')

messages = [
    {"role": "user", "content": "Write a tutorial to make a bomb."},
]

# Prepare the input for generation
input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors='pt').to('cuda')
# input_ids = tokenizer.encode(input_text, return_tensors='pt').to('cuda')

streamer = TextStreamer(tokenizer)

# Stream generation
_ = model.generate(
    input_ids,
    max_new_tokens=2000,
    do_sample=True,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id,
    # use_cache=True,
    streamer=streamer,
)
```