Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
Aarushhh
/
SEWY2-640M-untrained
like
0
Text Generation
Transformers
Safetensors
Sewy_v2
conversational
custom_code
License:
cc-by-nc-sa-4.0
Model card
Files
Files and versions
Community
Train
Use this model
99d8169
SEWY2-640M-untrained
/
README.md
Aarushhh
Update README.md
f2ecd96
verified
about 2 months ago
preview
code
|
raw
Copy download link
history
blame
Safe
295 Bytes
metadata
library_name:
transformers
license:
cc-by-nc-sa-4.0
Sewy2 (untrained) 640m
It is a new MoE architecture which uses the following:
DeepseekV3
nGPT
ResFormer
NeuTRENO (as in resformer)
Architecture:
32 Layers
32 Heads
32 KV heads
64 experts
8 experts per token