Sewy2 (untrained) 640m

It is a new MoE architecture which uses the following:

  • DeepseekV3
  • nGPT
  • ResFormer
  • NeuTRENO (as in resformer)
  • Tanh logit softcapping (as in Gemma2)

Architecture:

  • 32 Layers
  • 32 Heads
  • 32 KV heads
  • 64 experts
  • 8 experts per token
Downloads last month
158
Safetensors
Model size
640M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support model that require custom code execution.