File size: 295 Bytes
d65c8c1 f2ecd96 d65c8c1 39e41f8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
---
library_name: transformers
license: cc-by-nc-sa-4.0
---
# Sewy2 (untrained) 640m
## It is a new MoE architecture which uses the following:
- DeepseekV3
- nGPT
- ResFormer
- NeuTRENO (as in resformer)
## Architecture:
- 32 Layers
- 32 Heads
- 32 KV heads
- 64 experts
- 8 experts per token |