---
license: other
language:
- en
pipeline_tag: text-generation
inference: false
tags:
- transformers
- gguf
- imatrix
- INTELLECT-1-Instruct
---
Quantizations of https://huggingface.co/PrimeIntellect/INTELLECT-1-Instruct

### Inference Clients/UIs
* [llama.cpp](https://github.com/ggerganov/llama.cpp)
* [KoboldCPP](https://github.com/LostRuins/koboldcpp)
* [ollama](https://github.com/ollama/ollama)
* [jan](https://github.com/janhq/jan)
* [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
* [GPT4All](https://github.com/nomic-ai/gpt4all)
---

# From original readme

**INTELLECT-1** is the first collaboratively trained 10 billion parameter language model trained from scratch on 1 trillion tokens of English text and code.

This is an instruct model. The base model associated with it is [INTELLECT-1](https://huggingface.co/PrimeIntellect/INTELLECT-1).


**INTELLECT-1** was trained on up to 14 concurrent nodes distributed across 3 continents, with contributions from 30 independent community contributors providing compute.
The training code utilizes the [prime framework](https://github.com/PrimeIntellect-ai/prime), a scalable distributed training framework designed for fault-tolerant, dynamically scaling, high-perfomance training on unreliable, globally distributed workers.
The key abstraction that allows dynamic scaling is the `ElasticDeviceMesh` which manages dynamic global process groups for fault-tolerant communication across the internet and local process groups for communication within a node.
The model was trained using the [DiLoCo](https://arxiv.org/abs/2311.08105) algorithms with 100 inner steps. The global all-reduce was done with custom int8 all-reduce kernels to reduce the communication payload required, greatly reducing the communication overhead by a factor 400x.

For more detailed technical insights, please refer to our [technical paper](https://github.com/PrimeIntellect-ai/prime).

**Note: You must add a BOS token at the beginning of each sample. Performance may be impacted otherwise.**

## Usage
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

torch.set_default_device("cuda")
model = AutoModelForCausalLM.from_pretrained("PrimeIntellect/INTELLECT-1-Instruct")
tokenizer = AutoTokenizer.from_pretrained("PrimeIntellect/INTELLECT-1-Instruct")

input_text = "What is the Metamorphosis of Prime Intellect about?"
input_ids = tokenizer.encode(input_text, return_tensors="pt")
output_ids = model.generate(input_ids, max_length=50, num_return_sequences=1)
output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)

print(output_text)
```

### Example text generation pipeline
```python
import torch
from transformers import pipeline
torch.set_default_device("cuda")

pipe = pipeline("text-generation", model="PrimeIntellect/INTELLECT-1")
print(pipe("What is prime intellect ?"))
```