metadata

library_name: transformers
tags:
  - gpt
  - distillation
  - mobile
  - embedded
  - onnx
license: cc-by-nc-4.0
datasets:
  - custom
  - web
language: en
widget:
  - text: In order to make pancakes, you need to
  - text: Once upon a time

IJK Technology – ByteGPT-r1

ByteGPT-r1 is a distilled version of DeepSeek's QWEN 1.5B model, optimized specifically for mobile and edge computing environments. It maintains impressive language capabilities while being designed for compute- and memory-constrained devices.

🚀 Overview

Model Type: Distilled GPT-style causal language model
Base Model: DeepSeek's QWEN 1.5B
Intended Use: Edge devices, mobile phones, embedded systems
Size: Optimized for mobile deployment
Training: Knowledge distillation from QWEN 1.5B

🧠 Why ByteGPT-r1?

ByteGPT-r1 offers several advantages for mobile and edge deployment:

Efficient Knowledge Distillation:
Carefully distilled from DeepSeek's QWEN 1.5B model to preserve capabilities while reducing computational requirements.
Mobile-First Design:
Architected specifically for the constraints of mobile devices, with optimizations for both inference speed and memory usage.
Balanced Performance:
Maintains a good balance between model size and language generation capabilities, making it practical for real-world mobile applications.

💡 Future Plans

This model is part of our ongoing effort to bring powerful language models to edge devices. Upcoming releases will include:

Specialized Variants: Domain-specific versions optimized for particular use cases
Further Optimizations: Continued improvements in efficiency and performance
Benchmark Results: Comparative performance on various mobile devices
Integration Examples: More code samples for popular mobile frameworks

💻 Usage

Quick Start (with `transformers`):

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("ijktech/ByteGPT-r1", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("ijktech/ByteGPT-r1")

input_text = "What is the capital of France?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Tokenizer

The tokenizer is compatible with AutoTokenizer from Hugging Face:

tokenizer = AutoTokenizer.from_pretrained("ijktech/ByteGPT-r1")

ONNX

The model is also available in ONNX format, and can be used with the ONNX Runtime:

import onnxruntime as ort
import numpy as np

# Create ONNX Runtime session
ort_session = ort.InferenceSession("model.onnx")

# Helper function to generate text using the ONNX model
def generate_with_onnx(prompt_ids, max_new_tokens=50, temperature=1.0):
    input_ids = prompt_ids.clone()
    
    for _ in range(max_new_tokens):
        # Get the last block_size tokens if input is too long
        if input_ids.shape[1] > model.block_size:
            input_ids = input_ids[:, -model.block_size:]
            
        # Run inference
        ort_inputs = {
            'input': input_ids.cpu().numpy()
        }
        logits = ort_session.run(None, ort_inputs)[0]
        
        # Get predictions for the next token
        logits = torch.from_numpy(logits)
        logits = logits[:, -1, :] # Only take the last token's predictions
        
        # Apply temperature
        if temperature != 1.0:
            logits = logits / temperature
            
        # Sample from the distribution
        probs = torch.nn.functional.softmax(logits, dim=-1)
        next_token = torch.multinomial(probs, num_samples=1)
        
        # Append the new token
        input_ids = torch.cat([input_ids, next_token], dim=1)
    
    return input_ids

# Test the generation
prompt = "Hello"
prompt_ids = tok(prompt, return_tensors="pt")["input_ids"]
generated_ids = generate_with_onnx(prompt_ids)
generated_text = tok.decode(generated_ids[0], skip_special_tokens=True)
print(f"Generated text: {generated_text}")
#Generated text: Hello there! How can I assist you today? I'm a helpful AI assistant trained to provide information and answer questions on a wide range of topics.

Android Usage

Coming Soon!

iOS Usage

Coming Soon!

📜 License

📍 CC-BY-NC-4.0: Free for non-commercial use.

💼 Commercial Use: Contact IJK Technology Ltd for licensing at [email protected].

🛠️ About IJK Technology Ltd

IJK Technology Ltd (IJKTech) develops innovative machine learning models optimized for on-device inference. Our focus is on efficiency, privacy, and usability across mobile and embedded platforms.