File size: 295 Bytes
d65c8c1
 
f2ecd96
d65c8c1
 
39e41f8
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
---
library_name: transformers
license: cc-by-nc-sa-4.0
---

# Sewy2 (untrained) 640m

## It is a new MoE architecture which uses the following:
- DeepseekV3
- nGPT
- ResFormer
- NeuTRENO (as in resformer)

## Architecture:
- 32 Layers
- 32 Heads
- 32 KV heads
- 64 experts
- 8 experts per token