SmolLM-125M: A Lightweight Language Model for Consumer Hardware
This is a 125M parameter language model designed to be trained and run on consumer hardware with limited VRAM (4GB+). The model follows a GPT-style architecture but is optimized for efficiency and memory usage.
Model Details
- Architecture: GPT-style Transformer
- Parameters: 125M
- Context Length: 512 tokens
- Vocabulary: 50,257 tokens (GPT-2 tokenizer)
- Training Data: WikiText-2
- Hardware Requirements: 4GB+ VRAM GPU
Architecture Specifications
- Layers: 12 transformer blocks
- Attention Heads: 12
- Embedding Dimension: 768
- Activation: GELU
- Layer Normalization: Pre-norm
Training Details
- Hardware Used: GTX 1650 (4GB VRAM)
- Training Time: ~4 hours
- Batch Size: 4 (16 with gradient accumulation)
- Learning Rate: 3e-4 with cosine decay
- Weight Decay: 0.1
- Optimizer: AdamW
Memory Optimizations
- Length-based batch scheduling
- Gradient accumulation (4 steps)
- Dynamic batch scheduling
- Pre-padded sequences
Usage
from transformers import AutoTokenizer
from model import SmallLanguageModel, ModelConfig
# Initialize model
config = ModelConfig(
vocab_size=50257,
block_size=512,
n_layer=12,
n_head=12,
n_embd=768,
dropout=0.1,
bias=True
)
model = SmallLanguageModel(config)
# Generate text
tokenizer = AutoTokenizer.from_pretrained("gpt2")
input_text = "Once upon a time"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
output_ids = model.generate(input_ids, max_length=100)
generated_text = tokenizer.decode(output_ids[0])
Limitations
- Limited context window (512 tokens)
- Smaller capacity compared to larger models
- Training data limited to WikiText-2
License
This model is released under the MIT License.
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The HF Inference API does not support text-generation models for pytorch library.
Dataset used to train waghmareps12/SmolLM_125M
Evaluation results
- perplexity on WikiText-2self-reportedto_be_updated
- loss on WikiText-2self-reportedto_be_updated