IJK Technology – ByteGPT-small

---
library_name: transformers
tags:
  - gpt
  - byte-tokenization
  - mobile
  - embedded
  - onnx
license: cc-by-nc-4.0
datasets:
  - custom
  - web
language: en
widget:
  - text: "In order to make pancakes, you need to"
  - text: "Once upon a time"
---

<p align="center">
  <img src="logo.png" alt="IJK Technology" width="150">
</p>

<h1 align="center">IJK Technology – ByteGPT-small</h1>


**ByteGPT-small** is a small GPT-style language model trained using byte tokenization inspired by the ByT5 paper. It is designed for use on compute- and memory-constrained devices, such as mobile phones and embedded systems.

## 🚀 Overview
- **Model Type:** GPT-style causal language model  
- **Tokenizer:** Byte-level tokenization (from ByT5)  
- **Intended Use:** Edge devices, mobile phones, embedded systems  
- **Size:** Small (initial prototype)  
- **Training:** Custom-trained from scratch  

## 🧠 Why Byte Tokenization?
Byte tokenization offers several advantages for small-scale, efficient models:

1. **Reduced Memory Footprint:**  
   Byte-level tokenization drastically reduces the size of the embedding layer, making the model suitable for devices with limited RAM.

2. **No External Dependencies:**  
   Unlike subword tokenizers (e.g., SentencePiece, BPE), byte tokenization requires no external libraries for tokenization. A simple Python script can handle tokenization.

3. **Robustness to Noise:**  
   Byte-level models are more robust to misspellings, typos, and out-of-vocabulary tokens.

## 💡 Future Plans
This is the **first** in a series of models. While this model is not yet highly useful due to its small size, it represents the foundation for future versions. Upcoming releases will include:

- **Larger Models:** Scaled-up versions with better performance  
- **Distilled Models:** Using GPRO distillation to create highly efficient small models  
- **Benchmark Results:** Comparative performance on mobile devices  

## 💻 Usage

### **Quick Start (with `transformers`):**
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("ijktech/ByteGPT-small", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("ijktech/ByteGPT-small")

input_text = "What is the capital of France?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### Tokenizer

The tokenizer is byte-level, compatible with AutoTokenizer from Hugging Face:

```python
tokenizer = AutoTokenizer.from_pretrained("ijktech/ByteGPT-small")
```

### ONNX

The model is also available in ONNX format, and can be used with the ONNX Runtime:

```python
import onnxruntime as ort
import numpy as np

# Create ONNX Runtime session
ort_session = ort.InferenceSession("model.onnx")

# Helper function to generate text using the ONNX model
def generate_with_onnx(prompt_ids, max_new_tokens=50, temperature=1.0):
    input_ids = prompt_ids.clone()
    
    for _ in range(max_new_tokens):
        # Get the last block_size tokens if input is too long
        if input_ids.shape[1] > model.block_size:
            input_ids = input_ids[:, -model.block_size:]
            
        # Run inference
        ort_inputs = {
            'input': input_ids.cpu().numpy()
        }
        logits = ort_session.run(None, ort_inputs)[0]
        
        # Get predictions for the next token
        logits = torch.from_numpy(logits)
        logits = logits[:, -1, :] # Only take the last token's predictions
        
        # Apply temperature
        if temperature != 1.0:
            logits = logits / temperature
            
        # Sample from the distribution
        probs = torch.nn.functional.softmax(logits, dim=-1)
        next_token = torch.multinomial(probs, num_samples=1)
        
        # Append the new token
        input_ids = torch.cat([input_ids, next_token], dim=1)
    
    return input_ids

# Test the generation
prompt = "Hello"
prompt_ids = tok(prompt, return_tensors="pt")["input_ids"]
generated_ids = generate_with_onnx(prompt_ids)
generated_text = tok.decode(generated_ids[0], skip_special_tokens=True)
print(f"Generated text: {generated_text}")
#Generated text: Hello everyone!
#A dinner is only available for St. Loui
```

### Android Usage

We've just released an Android SDK. You can find the SDK on our [GitHub](https://github.com/ijktech/ByteGPT-Android).

The SDK can be included in your Android project by adding the following to your `build.gradle` file:

```
repositories {
    maven { 
        url = uri("https://raw.githubusercontent.com/ijktech/ByteGPT-Android/maven-repo") 
    }
}

dependencies {
    implementation("com.github.ijktech:ByteGPT-Android:1.0.9")
}
```


### iOS Usage

Coming Soon! 


## 📜 License
📍 **CC-BY-NC-4.0**: Free for non-commercial use.

💼 **Commercial Use**: Contact IJK Technology Ltd for licensing at [james@ijktech.com](mailto:james@ijktech.com).

## 🛠️ About IJK Technology Ltd
IJK Technology Ltd (IJKTech) develops innovative machine learning models optimized for on-device inference. Our focus is on efficiency, privacy, and usability across mobile and embedded platforms.