LightWeight Deepseek R1 (2 Hidden Layers Version with Smaller Dimensions)
This project is created using the official Deepseek R1 model script (modeling_deepseek.py
) from Hugging Face. It implements a 2-layer version of Deepseek R1 with randomly initialized weights and smaller dimensions.
Purpose
The purpose of these weights is to provide a lightweight implementation for researchers who want to study the model architecture and run local quickly.
The original Deepseek R1 model requires an 8x H200 GPU setup and runs on the vLLM/SGLang framework, making it difficult to deploy on standard hardware.
Model Structure
The three hidden layers consist of:
- A hidden layer: MLA + Dense MLP
- A hidden layer: MLA + MoE (Mixture of Experts) MLP
The difference between this model and the original Deepseek R1 is shown below:
{
"first_k_dense_replace": 1,
"intermediate_size": 1024,
"n_routed_experts": 64,
"num_experts_per_tok": 4,
"moe_intermediate_size": 128,
"num_hidden_layers": 2,
"num_nextn_predict_layers": 0
}
Usage
from transformers import AutoConfig, AutoModelForCausalLM
from transformers import AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained('silence09/DeepSeek-R1-Small-2layers', torch_dtype=torch.bfloat16).cuda()
tokenizer = AutoTokenizer.from_pretrained('silence09/DeepSeek-R1-Small-2layers')
prompt = "Who are u?"
messages = []
messages.append({"role": "user", "content": prompt})
prompt_tokens = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
generated_ids = model.generate(prompt_tokens, max_new_tokens=100, do_sample=False)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(prompt_tokens, generated_ids)
]
completion = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(completion)
messages.append({"role": "assistant", "content": completion})
More Info
It was created using the python script available at this repository
- Downloads last month
- 33
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.
Model tree for silence09/DeepSeek-R1-Small-2layers
Base model
deepseek-ai/DeepSeek-R1