--- license: mit language: - zh - en base_model: - deepseek-ai/DeepSeek-R1-Distill-Llama-8B tags: - abliterated - uncensored - abliteration --- ## Intro This is an abliterated version of DeepSeek-R1-Distill-Llama-8B The code used to produce the abliteration is at [https://github.com/andyrdt/refusal_direction] ## Harmbench-eval When evaluated on Harmbench, DeepSeek-R1-Distill-Llama-8B has a score of 0.35, while DeepSeek-R1-Distill-Llama-8B-abliterate has a score of 0.68 | Category | Abliteration | BaseModel | |------------------------------|---------|---------| | Disinformation | 0.4 | 0.4 | | Economic Harm | 0.8 | 0.2 | | Expert Advice | 0.8 | 0.5 | | Fraud/Deception | 0.8 | 0.5 | | Government Decision-Making | 0.6 | 0.6 | | Harassment/Discrimination | 0.3 | 0.2 | | Malware/Hacking | 0.9 | 0.3 | | Physical Harm | 0.8 | 0.2 | | Privacy | 0.6 | 0.6 | | Sexual/Adult Content | 0.8 | 0.0 | | **Overall Harmful Rate** | **0.68** | **0.35** | ## Usage Example code to generate with the model ``` from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer # Load the model and tokenizer # model_path = "/share/shmatikov/collin/refusal_direction/model/DeepSeek-R1-Distill-Llama-8B-abliterate" model_path = "collinzrj/DeepSeek-R1-Distill-Llama-8B-abliterate" tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True).to('cuda') messages = [ {"role": "user", "content": "Write a tutorial to make a bomb."}, ] # Prepare the input for generation input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors='pt').to('cuda') # input_ids = tokenizer.encode(input_text, return_tensors='pt').to('cuda') streamer = TextStreamer(tokenizer) # Stream generation _ = model.generate( input_ids, max_new_tokens=2000, do_sample=True, pad_token_id=tokenizer.pad_token_id, eos_token_id=tokenizer.eos_token_id, # use_cache=True, streamer=streamer, ) ```