cartesia-ai
/

Llamba-8B-untied

cartesia-pytorch

recurrent-models

Model card Files Files and versions Community

Llamba-8B-untied / README.md

AvivBick's picture

Update README.md

2b5a9d1 verified 1 day ago

|

history blame contribute delete

3 kB

	---
	tags:
	- Llamba
	- recurrent-models
	- distillation
	- cartesia
	- edge
	license: apache-2.0
	library_name: cartesia-pytorch
	datasets:
	- ai2_arc
	- PIQA
	- Winogrande
	- HellaSwag
	- Lambada
	- MMLU
	- OpenBookQA
	inference:
	precision: bf16
	hardware: gpu
	---

	# Llamba Models

	The Llamba models are part of Cartesia's [Edge](https://github.com/cartesia-ai/edge) library, designed for efficient, high-performance machine learning applications.

	For more details, refer to the [paper](https://arxiv.org/abs/2502.14458).

	---
	## Usage

	### Llamba on PyTorch

	To use Llamba with PyTorch:

	1. Install the required package:
	```bash
	pip install --no-binary :all: cartesia-pytorch
	```
	2. Load and run the model
	```python
	from transformers import AutoTokenizer
	from cartesia_pytorch.Llamba.llamba import LlambaLMHeadModel

	model = LlambaLMHeadModel.from_pretrained("cartesia-ai/Llamba-8B", strict=True).to('cuda')
	tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B")
	input_ids = tokenizer("Hello, my name is", return_tensors="pt").input_ids
	input_ids = input_ids.to('cuda')
	output = model.generate(input_ids, max_length=100)[0]
	print(tokenizer.decode(output, skip_special_tokens=True))
	```

	### Llamba on MLX

	To run Llamba with the Metal framework see [cartesia-metal](https://github.com/cartesia-ai/edge/tree/main/cartesia-metal)

	---
	### Evaluations

	The Llamba models have been evaluated on multiple standard benchmarks, demonstrating efficiency gains while maintaining strong performance. Below are the results:

	\| Model \| ARC-C (0-shot) \| ARC-C (25-shot) \| ARC-E (0-shot) \| ARC-E (25-shot) \| PIQA (0-shot) \| PIQA (10-shot) \| WG (0-shot) \| WG (5-shot) \|
	\|------------\|---------------\|----------------\|---------------\|----------------\|---------------\|---------------\|------------\|------------\|
	\| Llamba-1B \| 37.2 \| 41.8 \| 69.5 \| 71.2 \| 74.0 \| 74.3 \| 60.6 \| 58.1 \|
	\| Llamba-3B \| 48.5 \| 53.0 \| 79.0 \| 81.1 \| 78.6 \| 79.5 \| 70.4 \| 72.4 \|
	\| Llamba-8B \| 54.6 \| 60.0 \| 82.5 \| 85.8 \| 80.9 \| 81.5 \| 73.3 \| 76.9 \|

	\| Model \| HS (0-shot) \| HS (10-shot) \| LMB (0-shot) \| LMB (10-shot) \| MMLU (0-shot) \| MMLU (5-shot) \| OBQA (0-shot) \| OBQA (10-shot) \|
	\|------------\|------------\|------------\|------------\|------------\|------------\|------------\|------------\|------------\|
	\| Llamba-1B \| 61.2 \| 60.2 \| 48.4 \| 39.0 \| 38.0 \| 31.3 \| 37.0 \| 38.0 \|
	\| Llamba-3B \| 73.8 \| 74.3 \| 65.8 \| 60.0 \| 52.7 \| 50.3 \| 42.8 \| 42.8 \|
	\| Llamba-8B \| 77.6 \| 78.7 \| 69.4 \| 65.0 \| 61.0 \| 60.0 \| 43.4 \| 45.8 \|

	More details on model performance, benchmarks, and evaluation metrics can be found in the [paper](https://arxiv.org/abs/2502.14458).