SmolLM-125M: A Lightweight Language Model for Consumer Hardware

This is a 125M parameter language model designed to be trained and run on consumer hardware with limited VRAM (4GB+). The model follows a GPT-style architecture but is optimized for efficiency and memory usage.

Model Details

  • Architecture: GPT-style Transformer
  • Parameters: 125M
  • Context Length: 512 tokens
  • Vocabulary: 50,257 tokens (GPT-2 tokenizer)
  • Training Data: WikiText-2
  • Hardware Requirements: 4GB+ VRAM GPU

Architecture Specifications

  • Layers: 12 transformer blocks
  • Attention Heads: 12
  • Embedding Dimension: 768
  • Activation: GELU
  • Layer Normalization: Pre-norm

Training Details

  • Hardware Used: GTX 1650 (4GB VRAM)
  • Training Time: ~4 hours
  • Batch Size: 4 (16 with gradient accumulation)
  • Learning Rate: 3e-4 with cosine decay
  • Weight Decay: 0.1
  • Optimizer: AdamW

Memory Optimizations

  1. Length-based batch scheduling
  2. Gradient accumulation (4 steps)
  3. Dynamic batch scheduling
  4. Pre-padded sequences

Usage

from transformers import AutoTokenizer
from model import SmallLanguageModel, ModelConfig

# Initialize model
config = ModelConfig(
    vocab_size=50257,
    block_size=512,
    n_layer=12,
    n_head=12,
    n_embd=768,
    dropout=0.1,
    bias=True
)
model = SmallLanguageModel(config)

# Generate text
tokenizer = AutoTokenizer.from_pretrained("gpt2")
input_text = "Once upon a time"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
output_ids = model.generate(input_ids, max_length=100)
generated_text = tokenizer.decode(output_ids[0])

Limitations

  • Limited context window (512 tokens)
  • Smaller capacity compared to larger models
  • Training data limited to WikiText-2

License

This model is released under the MIT License.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support text-generation models for pytorch library.

Dataset used to train waghmareps12/SmolLM_125M

Evaluation results