--- license: other language: - en pipeline_tag: text-generation inference: false tags: - transformers - gguf - imatrix - INTELLECT-1-Instruct --- Quantizations of https://huggingface.co/PrimeIntellect/INTELLECT-1-Instruct ### Inference Clients/UIs * [llama.cpp](https://github.com/ggerganov/llama.cpp) * [KoboldCPP](https://github.com/LostRuins/koboldcpp) * [ollama](https://github.com/ollama/ollama) * [jan](https://github.com/janhq/jan) * [text-generation-webui](https://github.com/oobabooga/text-generation-webui) * [GPT4All](https://github.com/nomic-ai/gpt4all) --- # From original readme **INTELLECT-1** is the first collaboratively trained 10 billion parameter language model trained from scratch on 1 trillion tokens of English text and code. This is an instruct model. The base model associated with it is [INTELLECT-1](https://huggingface.co/PrimeIntellect/INTELLECT-1). **INTELLECT-1** was trained on up to 14 concurrent nodes distributed across 3 continents, with contributions from 30 independent community contributors providing compute. The training code utilizes the [prime framework](https://github.com/PrimeIntellect-ai/prime), a scalable distributed training framework designed for fault-tolerant, dynamically scaling, high-perfomance training on unreliable, globally distributed workers. The key abstraction that allows dynamic scaling is the `ElasticDeviceMesh` which manages dynamic global process groups for fault-tolerant communication across the internet and local process groups for communication within a node. The model was trained using the [DiLoCo](https://arxiv.org/abs/2311.08105) algorithms with 100 inner steps. The global all-reduce was done with custom int8 all-reduce kernels to reduce the communication payload required, greatly reducing the communication overhead by a factor 400x. For more detailed technical insights, please refer to our [technical paper](https://github.com/PrimeIntellect-ai/prime). **Note: You must add a BOS token at the beginning of each sample. Performance may be impacted otherwise.** ## Usage ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer torch.set_default_device("cuda") model = AutoModelForCausalLM.from_pretrained("PrimeIntellect/INTELLECT-1-Instruct") tokenizer = AutoTokenizer.from_pretrained("PrimeIntellect/INTELLECT-1-Instruct") input_text = "What is the Metamorphosis of Prime Intellect about?" input_ids = tokenizer.encode(input_text, return_tensors="pt") output_ids = model.generate(input_ids, max_length=50, num_return_sequences=1) output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True) print(output_text) ``` ### Example text generation pipeline ```python import torch from transformers import pipeline torch.set_default_device("cuda") pipe = pipeline("text-generation", model="PrimeIntellect/INTELLECT-1") print(pipe("What is prime intellect ?")) ```