Aarushhh's picture
Update README.md
f2ecd96 verified
|
raw
history blame
295 Bytes
metadata
library_name: transformers
license: cc-by-nc-sa-4.0

Sewy2 (untrained) 640m

It is a new MoE architecture which uses the following:

  • DeepseekV3
  • nGPT
  • ResFormer
  • NeuTRENO (as in resformer)

Architecture:

  • 32 Layers
  • 32 Heads
  • 32 KV heads
  • 64 experts
  • 8 experts per token