cartesia-ai
/

Llamba-8B-untied

cartesia-pytorch

recurrent-models

Model card Files Files and versions Community

AvivBick commited on 13 days ago

Commit

2b5a9d1

·

verified ·

1 Parent(s): 6cc1bd8

Update README.md

Files changed (1) hide show

README.md +74 -3

README.md CHANGED Viewed

@@ -1,3 +1,74 @@
----
-license: mit
----

+---
+tags:
+  - Llamba
+  - recurrent-models
+  - distillation
+  - cartesia
+  - edge
+license: apache-2.0
+library_name: cartesia-pytorch
+datasets:
+  - ai2_arc
+  - PIQA
+  - Winogrande
+  - HellaSwag
+  - Lambada
+  - MMLU
+  - OpenBookQA
+inference:
+  precision: bf16
+  hardware: gpu
+---
+# Llamba Models
+The Llamba models are part of Cartesia's [Edge](https://github.com/cartesia-ai/edge) library, designed for efficient, high-performance machine learning applications.
+For more details, refer to the [paper](https://arxiv.org/abs/2502.14458).
+---
+## Usage
+### Llamba on PyTorch
+To use Llamba with PyTorch:
+1. Install the required package:
+ ```bash
+ pip install --no-binary :all: cartesia-pytorch
+ ```
+2. Load and run the model
+```python
+from transformers import AutoTokenizer
+from cartesia_pytorch.Llamba.llamba import LlambaLMHeadModel
+model = LlambaLMHeadModel.from_pretrained("cartesia-ai/Llamba-8B", strict=True).to('cuda')
+tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B")
+input_ids = tokenizer("Hello, my name is", return_tensors="pt").input_ids
+input_ids = input_ids.to('cuda')
+output = model.generate(input_ids, max_length=100)[0]
+print(tokenizer.decode(output, skip_special_tokens=True))
+```
+### Llamba on MLX
+To run Llamba with the Metal framework see [cartesia-metal](https://github.com/cartesia-ai/edge/tree/main/cartesia-metal)
+---
+### Evaluations
+The Llamba models have been evaluated on multiple standard benchmarks, demonstrating efficiency gains while maintaining strong performance. Below are the results:
+| Model      | ARC-C (0-shot) | ARC-C (25-shot) | ARC-E (0-shot) | ARC-E (25-shot) | PIQA (0-shot) | PIQA (10-shot) | WG (0-shot) | WG (5-shot) |
+|------------|---------------|----------------|---------------|----------------|---------------|---------------|------------|------------|
+| Llamba-1B  | 37.2          | 41.8           | 69.5          | 71.2           | 74.0          | 74.3          | 60.6       | 58.1       |
+| Llamba-3B  | 48.5          | 53.0           | 79.0          | 81.1           | 78.6          | 79.5          | 70.4       | 72.4       |
+| Llamba-8B  | 54.6          | 60.0           | 82.5          | 85.8           | 80.9          | 81.5          | 73.3       | 76.9       |
+| Model      | HS (0-shot) | HS (10-shot) | LMB (0-shot) | LMB (10-shot) | MMLU (0-shot) | MMLU (5-shot) | OBQA (0-shot) | OBQA (10-shot) |
+|------------|------------|------------|------------|------------|------------|------------|------------|------------|
+| Llamba-1B  | 61.2       | 60.2       | 48.4       | 39.0       | 38.0       | 31.3       | 37.0       | 38.0       |
+| Llamba-3B  | 73.8       | 74.3       | 65.8       | 60.0       | 52.7       | 50.3       | 42.8       | 42.8       |
+| Llamba-8B  | 77.6       | 78.7       | 69.4       | 65.0       | 61.0       | 60.0       | 43.4       | 45.8       |
+More details on model performance, benchmarks, and evaluation metrics can be found in the [paper](https://arxiv.org/abs/2502.14458).