Aarushhh
/

SEWY2-640M-untrained

Text Generation

Model card Files Files and versions Community

SEWY2-640M-untrained / README.md

Aarushhh's picture

Update README.md

4f22b2d verified about 1 month ago

|

history blame contribute delete

335 Bytes

metadata

library_name: transformers
license: cc-by-nc-sa-4.0

Sewy2 (untrained) 640m

It is a new MoE architecture which uses the following:

DeepseekV3
nGPT
ResFormer
NeuTRENO (as in resformer)
Tanh logit softcapping (as in Gemma2)

Architecture:

32 Layers
32 Heads
32 KV heads
64 experts
8 experts per token