--- title: SmoLLMv2 emoji: 🐢 colorFrom: indigo colorTo: blue sdk: gradio sdk_version: 5.13.1 app_file: app.py pinned: false license: mit short_description: Text generation using smollmv2-135M model --- # SmoLLMv2: A Small but Efficient Language Model [Training Repo Link](https://github.com/Shilpaj1994/SmoLLMv2) [Gradio App Link](https://huggingface.co/spaces/Shilpaj/SmoLLMv2) SmoLLMv2 is a 135M parameter language model designed for efficient text generation. It incorporates several modern architectural improvements while maintaining a small footprint. ## Features - **Efficient Architecture**: - 30 transformer layers - 9 attention heads - 576 embedding dimension - Memory-efficient attention with reduced KV dimensions - Rotary Position Embeddings (RoPE) - SwiGLU activation function - **Training Optimizations**: - Mixed precision training (16-bit) - Gradient accumulation - OneCycleLR scheduler - Streaming dataset support - Automatic model compilation (with PyTorch 2.0+) ## Model Architecture SmoLLMv2 incorporates several efficiency improvements: 1. **Reduced KV Dimensions**: Uses 189-dimensional key/value projections (instead of full 576) to save memory and computation. 2. **RoPE Attention**: Implements Rotary Position Embeddings for better handling of sequential information. 3. **SwiGLU Activation**: Uses the SwiGLU activation function in the MLP layers for better performance. 4. **Weight Sharing**: Shares weights between input embeddings and output projection. ## Configuration The model's behavior can be customized through various configuration classes in `config.py`: - `SmollmConfig`: Core model architecture and training parameters - `RoPEConfig`: Rotary Position Embedding settings - `OptimizerConfig`: Optimization and learning rate settings - `DataConfig`: Dataset and tokenizer configuration - `TrainerConfig`: Training infrastructure settings ## Dataset The model is trained on the Cosmopedia dataset, which is streamed during training to handle large-scale data efficiently. ## Requirements See `requirements.txt` for full dependencies. Key requirements: - PyTorch ≥ 2.0.0 - Transformers ≥ 4.30.0 - Lightning ≥ 2.0.0 - Gradio ≥ 5.13.1