nvidia
/

Hymba-1.5B-Base

Text Generation

Model card Files Files and versions Community

SimonX commited on Dec 11, 2024

Commit

e1b7ee9

·

verified ·

1 Parent(s): a98fb88

Update README.md

Files changed (1) hide show

README.md +4 -0

README.md CHANGED Viewed

@@ -22,6 +22,7 @@ The model has hybrid architecture with Mamba and Attention heads running in para
 This model is ready for commercial use.
 **[Caution] During generation, the batch size needs to be 1. Our current implementation does not fully support padding of Meta tokens + SWA; this is a work in progress. Training and pre-filling support any batch size.**
@@ -35,6 +36,8 @@ This model is released under the [NVIDIA Open Model License Agreement](https://d
 ## Model Architecture
 Hymba-1.5B-Base has a model embedding size of 1600, 25 attention heads, and an MLP intermediate dimension of 5504, with 32 layers in total, 16 SSM states, 3 full attention layers, the rest are sliding window attention. Unlike the standard Transformer, each attention layer in Hymba has a hybrid combination of standard attention heads and Mamba heads in parallel.  Additionally, it uses Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE).
 Features of this architecture:
@@ -54,6 +57,7 @@ Features of this architecture:
 </div>
 ## Performance Highlights
 - Hymba-1.5B-Base outperforms all sub-2B public models.

 This model is ready for commercial use.
 **[Caution] During generation, the batch size needs to be 1. Our current implementation does not fully support padding of Meta tokens + SWA; this is a work in progress. Training and pre-filling support any batch size.**
 ## Model Architecture
+> ⚡️ We've released a minimal implementation of Hymba on GitHub to help developers understand and implement its design principles in their own models. Check it out! [barebones-hymba](https://github.com/NVlabs/hymba/tree/main/barebones_hymba).
 Hymba-1.5B-Base has a model embedding size of 1600, 25 attention heads, and an MLP intermediate dimension of 5504, with 32 layers in total, 16 SSM states, 3 full attention layers, the rest are sliding window attention. Unlike the standard Transformer, each attention layer in Hymba has a hybrid combination of standard attention heads and Mamba heads in parallel.  Additionally, it uses Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE).
 Features of this architecture:
 </div>
 ## Performance Highlights
 - Hymba-1.5B-Base outperforms all sub-2B public models.