Shoonya Model v0.2 - DeepSeek CPU-Optimized

This model is a CPU-optimized version of the Shoonya language model, incorporating techniques from the DeepSeek team for efficient inference on CPU hardware.

Model Description

Shoonya Model v0.2 is a lightweight transformer-based language model designed for efficient CPU inference. It incorporates architectural optimizations inspired by DeepSeek's research to achieve better performance on CPU hardware while maintaining good generation quality.

Model Details

Developed by: VaidhyaMegha
Model type: Transformer-based language model
Language(s): English
Training Data: TinyStories dataset
Parameters: 16.41M
Context Length: 512 tokens
Hidden Size: 256
Attention Heads: 8
Key-Value Heads: 4
Hidden Layers: 6
License: MIT
Repository: GitHub - VaidhyaMegha/Shoonya

DeepSeek CPU Optimizations

This model incorporates the following optimizations from the DeepSeek team:

Grouped-Query Attention (GQA) with a 2:1 ratio - Reduces memory usage and computational cost by sharing key and value projections across multiple query heads
Rotary Position Embeddings (RoPE) - Provides better positional encoding with improved extrapolation to longer sequences
RMSNorm - Offers improved training stability compared to LayerNorm
SwiGLU activation - Provides better performance in feed-forward networks compared to standard GELU
Sliding Window Attention with window size 256 - Reduces memory usage for longer sequences by limiting attention to a local window
ONNX export - Enables optimized runtime on various hardware platforms

Intended Uses & Limitations

Intended Uses:

Educational purposes to understand transformer architecture and optimizations
Research on efficient language model deployment
Text generation for simple creative writing tasks
Baseline for further fine-tuning on specific tasks

Limitations:

The model is trained on a limited dataset (TinyStories) and has a relatively small parameter count
It may not perform well on complex reasoning tasks or specialized domains
The model has not been extensively evaluated for biases or harmful outputs

Training Procedure

Training Data

The model was trained on the TinyStories dataset, which contains simple stories suitable for young children, generated by GPT-3.5/4.

Training Hyperparameters

Optimizer: AdamW
Learning Rate: 5e-5
Batch Size: 4
Weight Decay: 0.01
Warmup Steps: 100
Gradient Accumulation Steps: 4
Training Device: CPU (Mac Mini M4)
Training Epochs: 5

Note on Quantization

The quantized version of this model is not included due to PyTorch quantization limitations on Mac M-series chips. See quantization_note.md for instructions on how to quantize the model on a compatible system.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("VaidhyaMegha/Shoonya")
tokenizer = AutoTokenizer.from_pretrained("VaidhyaMegha/Shoonya")

# Generate text
input_text = "Once upon a time"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
output = model.generate(input_ids, max_length=100, temperature=0.7, top_p=0.9, repetition_penalty=1.1)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

Evaluation Results

The model achieved the following metrics during training:

Final Loss: 7.21
Final Perplexity: 1358.28

Ethical Considerations

This model is trained on the TinyStories dataset, which was designed to be suitable for children and contains simple, non-harmful content. However, as with any language model, it may still produce unexpected or potentially problematic outputs. Users should exercise caution and implement appropriate content filtering if deploying this model in production environments.

Citations

@article{eldan2023tinystories,
  title={{TinyStories: How Small Can Language Models Be and Still Speak Coherent English?}},
  author={Eldan, Ronen and Li, Yuanzhi},
  journal={arXiv preprint arXiv:2305.07759},
  year={2023}
}

License

This model is released under the MIT License.

VaidhyaMegha
/

Shoonya