Mamba Bit!

Mamba with vocab size 2 bites again! This time we bite at tiny stories. I didn't bother preprocess them at all, during a training model took random char offset, converted it to bit string and fed to mamba. This time I didn't forget about residual connections nor about norm. As the result model was trained in BF16.

Training code included.

Example to run a model from CLI:

$ python mambabit.py "Run, kitten, run"

Run, kitten, running and jumping. She saw a big tree and thought it would be fun to share the tree. So, she went to the tree and started to climb the tree. She saw a big tree and thought it would be fun to share the tree. So, she went to the tree and saw a big red ball.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.