Aarushhh's picture
Update README.md
f2ecd96 verified
|
raw
history blame
295 Bytes
---
library_name: transformers
license: cc-by-nc-sa-4.0
---
# Sewy2 (untrained) 640m
## It is a new MoE architecture which uses the following:
- DeepseekV3
- nGPT
- ResFormer
- NeuTRENO (as in resformer)
## Architecture:
- 32 Layers
- 32 Heads
- 32 KV heads
- 64 experts
- 8 experts per token