File size: 4,921 Bytes

5d1a6cc
 
d5a1ab0
 
 
 
 
 
 
 
 
 
 
 
 
 
5d1a6cc
 
d5a1ab0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5d1a6cc
d5a1ab0
 
5d1a6cc
d5a1ab0
 
5d1a6cc
d5a1ab0
 
5d1a6cc
d5a1ab0
 
5d1a6cc
d5a1ab0
 
 
 
5d1a6cc
d5a1ab0
5d1a6cc
d5a1ab0
 
 
5d1a6cc
d5a1ab0
 
5d1a6cc
d5a1ab0
 
 
5d1a6cc
d5a1ab0
 
5d1a6cc
d5a1ab0
5d1a6cc
d5a1ab0
5d1a6cc
d5a1ab0
 
 
5d1a6cc
d5a1ab0
5d1a6cc
d5a1ab0
5d1a6cc
d5a1ab0
 
 
5d1a6cc
d5a1ab0
 
5d1a6cc
d5a1ab0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5d1a6cc
d5a1ab0
 
 
 
 
 
 
 
5d1a6cc
d5a1ab0
5d1a6cc
d5a1ab0
5d1a6cc
 
d5a1ab0
5d1a6cc
d5a1ab0
5d1a6cc
 
d5a1ab0
 
5d1a6cc
d5a1ab0
5d1a6cc
d5a1ab0

---
library_name: transformers
tags:
  - gpt
  - distillation
  - mobile
  - embedded
  - onnx
license: cc-by-nc-4.0
datasets:
  - custom
  - web
language: en
widget:
  - text: "In order to make pancakes, you need to"
  - text: "Once upon a time"
---

<p align="center">
  <img src="logo.png" alt="IJK Technology" width="150">
</p>

<h1 align="center">IJK Technology – ByteGPT-r1</h1>


**ByteGPT-r1** is a distilled version of DeepSeek's QWEN 1.5B model, optimized specifically for mobile and edge computing environments. It maintains impressive language capabilities while being designed for compute- and memory-constrained devices.

## 🚀 Overview
- **Model Type:** Distilled GPT-style causal language model  
- **Base Model:** DeepSeek's QWEN 1.5B
- **Intended Use:** Edge devices, mobile phones, embedded systems  
- **Size:** Optimized for mobile deployment
- **Training:** Knowledge distillation from QWEN 1.5B

## 🧠 Why ByteGPT-r1?
ByteGPT-r1 offers several advantages for mobile and edge deployment:

1. **Efficient Knowledge Distillation:**  
   Carefully distilled from DeepSeek's QWEN 1.5B model to preserve capabilities while reducing computational requirements.

2. **Mobile-First Design:**  
   Architected specifically for the constraints of mobile devices, with optimizations for both inference speed and memory usage.

3. **Balanced Performance:**  
   Maintains a good balance between model size and language generation capabilities, making it practical for real-world mobile applications.

## 💡 Future Plans
This model is part of our ongoing effort to bring powerful language models to edge devices. Upcoming releases will include:

- **Specialized Variants:** Domain-specific versions optimized for particular use cases
- **Further Optimizations:** Continued improvements in efficiency and performance
- **Benchmark Results:** Comparative performance on various mobile devices
- **Integration Examples:** More code samples for popular mobile frameworks

## 💻 Usage

### **Quick Start (with `transformers`):**
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("ijktech/ByteGPT-r1", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("ijktech/ByteGPT-r1")

input_text = "What is the capital of France?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### Tokenizer

The tokenizer is compatible with AutoTokenizer from Hugging Face:

```python
tokenizer = AutoTokenizer.from_pretrained("ijktech/ByteGPT-r1")
```

### ONNX

The model is also available in ONNX format, and can be used with the ONNX Runtime:

```python
import onnxruntime as ort
import numpy as np

# Create ONNX Runtime session
ort_session = ort.InferenceSession("model.onnx")

# Helper function to generate text using the ONNX model
def generate_with_onnx(prompt_ids, max_new_tokens=50, temperature=1.0):
    input_ids = prompt_ids.clone()
    
    for _ in range(max_new_tokens):
        # Get the last block_size tokens if input is too long
        if input_ids.shape[1] > model.block_size:
            input_ids = input_ids[:, -model.block_size:]
            
        # Run inference
        ort_inputs = {
            'input': input_ids.cpu().numpy()
        }
        logits = ort_session.run(None, ort_inputs)[0]
        
        # Get predictions for the next token
        logits = torch.from_numpy(logits)
        logits = logits[:, -1, :] # Only take the last token's predictions
        
        # Apply temperature
        if temperature != 1.0:
            logits = logits / temperature
            
        # Sample from the distribution
        probs = torch.nn.functional.softmax(logits, dim=-1)
        next_token = torch.multinomial(probs, num_samples=1)
        
        # Append the new token
        input_ids = torch.cat([input_ids, next_token], dim=1)
    
    return input_ids

# Test the generation
prompt = "Hello"
prompt_ids = tok(prompt, return_tensors="pt")["input_ids"]
generated_ids = generate_with_onnx(prompt_ids)
generated_text = tok.decode(generated_ids[0], skip_special_tokens=True)
print(f"Generated text: {generated_text}")
#Generated text: Hello there! How can I assist you today? I'm a helpful AI assistant trained to provide information and answer questions on a wide range of topics.
```

### Android Usage

Coming Soon!


### iOS Usage

Coming Soon! 


## 📜 License
📍 **CC-BY-NC-4.0**: Free for non-commercial use.

💼 **Commercial Use**: Contact IJK Technology Ltd for licensing at [[email protected]](mailto:[email protected]).

## 🛠️ About IJK Technology Ltd
IJK Technology Ltd (IJKTech) develops innovative machine learning models optimized for on-device inference. Our focus is on efficiency, privacy, and usability across mobile and embedded platforms.