Shilpaj commited on
Commit
dbdeb7e
·
verified ·
1 Parent(s): 1cb4d80

Docs: Updated README

Browse files
Files changed (2) hide show
  1. README.md +67 -1
  2. app.py +8 -3
README.md CHANGED
@@ -11,4 +11,70 @@ license: mit
11
  short_description: Text generation using smollmv2-135M model
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  short_description: Text generation using smollmv2-135M model
12
  ---
13
 
14
+ # SmoLLMv2: A Small but Efficient Language Model
15
+
16
+ [Training Repo Link](https://github.com/Shilpaj1994/SmoLLMv2)
17
+ [Gradio App Link](https://huggingface.co/spaces/Shilpaj/SmoLLMv2)
18
+
19
+
20
+ SmoLLMv2 is a 135M parameter language model designed for efficient text generation. It incorporates several modern architectural improvements while maintaining a small footprint.
21
+
22
+
23
+
24
+ ## Features
25
+
26
+ - **Efficient Architecture**:
27
+ - 30 transformer layers
28
+ - 9 attention heads
29
+ - 576 embedding dimension
30
+ - Memory-efficient attention with reduced KV dimensions
31
+ - Rotary Position Embeddings (RoPE)
32
+ - SwiGLU activation function
33
+
34
+ - **Training Optimizations**:
35
+ - Mixed precision training (16-bit)
36
+ - Gradient accumulation
37
+ - OneCycleLR scheduler
38
+ - Streaming dataset support
39
+ - Automatic model compilation (with PyTorch 2.0+)
40
+
41
+
42
+
43
+ ## Model Architecture
44
+
45
+ SmoLLMv2 incorporates several efficiency improvements:
46
+
47
+ 1. **Reduced KV Dimensions**: Uses 189-dimensional key/value projections (instead of full 576) to save memory and computation.
48
+ 2. **RoPE Attention**: Implements Rotary Position Embeddings for better handling of sequential information.
49
+ 3. **SwiGLU Activation**: Uses the SwiGLU activation function in the MLP layers for better performance.
50
+ 4. **Weight Sharing**: Shares weights between input embeddings and output projection.
51
+
52
+
53
+
54
+ ## Configuration
55
+
56
+ The model's behavior can be customized through various configuration classes in `config.py`:
57
+
58
+ - `SmollmConfig`: Core model architecture and training parameters
59
+ - `RoPEConfig`: Rotary Position Embedding settings
60
+ - `OptimizerConfig`: Optimization and learning rate settings
61
+ - `DataConfig`: Dataset and tokenizer configuration
62
+ - `TrainerConfig`: Training infrastructure settings
63
+
64
+
65
+
66
+ ## Dataset
67
+
68
+ The model is trained on the Cosmopedia dataset, which is streamed during training to handle large-scale data efficiently.
69
+
70
+
71
+
72
+ ## Requirements
73
+
74
+ See `requirements.txt` for full dependencies. Key requirements:
75
+
76
+ - PyTorch ≥ 2.0.0
77
+ - Transformers ≥ 4.30.0
78
+ - Lightning ≥ 2.0.0
79
+ - Gradio ≥ 5.13.1
80
+
app.py CHANGED
@@ -95,6 +95,11 @@ model, tokenizer, device = load_model()
95
  def generate_text(prompt, num_tokens, temperature=0.8, top_p=0.9):
96
  """
97
  Generate text using the SmollmV2 model.
 
 
 
 
 
98
  """
99
  try:
100
  # Ensure num_tokens doesn't exceed model's block size
@@ -148,13 +153,13 @@ demo = gr.Interface(
148
  fn=generate_text,
149
  inputs=[
150
  gr.Textbox(label="Enter your prompt", value="Once upon a time"),
151
- gr.Slider(minimum=1, maximum=SmollmConfig.block_size, value=100, step=1, label="Number of tokens to generate"),
152
  gr.Slider(minimum=0.1, maximum=2.0, value=0.8, step=0.1, label="Temperature (higher = more random)"),
153
  gr.Slider(minimum=0.1, maximum=1.0, value=0.9, step=0.1, label="Top-p (nucleus sampling)")
154
  ],
155
  outputs=gr.Textbox(label="Generated Text"),
156
- title="SmollmV2 Text Generator",
157
- description="Generate text using the SmollmV2 model",
158
  allow_flagging="never",
159
  cache_examples=True
160
  )
 
95
  def generate_text(prompt, num_tokens, temperature=0.8, top_p=0.9):
96
  """
97
  Generate text using the SmollmV2 model.
98
+ :param prompt: The initial text prompt to start the generation from.
99
+ :param num_tokens: The number of tokens to generate.
100
+ :param temperature: The temperature parameter for controlling randomness.
101
+ :param top_p: The top-p parameter for nucleus sampling
102
+ :return: The generated text.
103
  """
104
  try:
105
  # Ensure num_tokens doesn't exceed model's block size
 
153
  fn=generate_text,
154
  inputs=[
155
  gr.Textbox(label="Enter your prompt", value="Once upon a time"),
156
+ gr.Slider(minimum=1, maximum=SmollmConfig.block_size//2, value=100, step=1, label="Number of tokens to generate"),
157
  gr.Slider(minimum=0.1, maximum=2.0, value=0.8, step=0.1, label="Temperature (higher = more random)"),
158
  gr.Slider(minimum=0.1, maximum=1.0, value=0.9, step=0.1, label="Top-p (nucleus sampling)")
159
  ],
160
  outputs=gr.Textbox(label="Generated Text"),
161
+ title="SmoLLMv2 Text Generator",
162
+ description="Generate text using the SmoLLMv2-135M model",
163
  allow_flagging="never",
164
  cache_examples=True
165
  )