library_name: transformers | |
license: cc-by-nc-sa-4.0 | |
# Sewy2 (untrained) 640m | |
## It is a new MoE architecture which uses the following: | |
- DeepseekV3 | |
- nGPT | |
- ResFormer | |
- NeuTRENO (as in resformer) | |
## Architecture: | |
- 32 Layers | |
- 32 Heads | |
- 32 KV heads | |
- 64 experts | |
- 8 experts per token |