ByteGPT-r1 / README.md
ijktech-jk's picture
Update with README
d5a1ab0 verified
---
library_name: transformers
tags:
- gpt
- distillation
- mobile
- embedded
- onnx
license: cc-by-nc-4.0
datasets:
- custom
- web
language: en
widget:
- text: "In order to make pancakes, you need to"
- text: "Once upon a time"
---
<p align="center">
<img src="logo.png" alt="IJK Technology" width="150">
</p>
<h1 align="center">IJK Technology – ByteGPT-r1</h1>
**ByteGPT-r1** is a distilled version of DeepSeek's QWEN 1.5B model, optimized specifically for mobile and edge computing environments. It maintains impressive language capabilities while being designed for compute- and memory-constrained devices.
## 🚀 Overview
- **Model Type:** Distilled GPT-style causal language model
- **Base Model:** DeepSeek's QWEN 1.5B
- **Intended Use:** Edge devices, mobile phones, embedded systems
- **Size:** Optimized for mobile deployment
- **Training:** Knowledge distillation from QWEN 1.5B
## 🧠 Why ByteGPT-r1?
ByteGPT-r1 offers several advantages for mobile and edge deployment:
1. **Efficient Knowledge Distillation:**
Carefully distilled from DeepSeek's QWEN 1.5B model to preserve capabilities while reducing computational requirements.
2. **Mobile-First Design:**
Architected specifically for the constraints of mobile devices, with optimizations for both inference speed and memory usage.
3. **Balanced Performance:**
Maintains a good balance between model size and language generation capabilities, making it practical for real-world mobile applications.
## 💡 Future Plans
This model is part of our ongoing effort to bring powerful language models to edge devices. Upcoming releases will include:
- **Specialized Variants:** Domain-specific versions optimized for particular use cases
- **Further Optimizations:** Continued improvements in efficiency and performance
- **Benchmark Results:** Comparative performance on various mobile devices
- **Integration Examples:** More code samples for popular mobile frameworks
## 💻 Usage
### **Quick Start (with `transformers`):**
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("ijktech/ByteGPT-r1", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("ijktech/ByteGPT-r1")
input_text = "What is the capital of France?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### Tokenizer
The tokenizer is compatible with AutoTokenizer from Hugging Face:
```python
tokenizer = AutoTokenizer.from_pretrained("ijktech/ByteGPT-r1")
```
### ONNX
The model is also available in ONNX format, and can be used with the ONNX Runtime:
```python
import onnxruntime as ort
import numpy as np
# Create ONNX Runtime session
ort_session = ort.InferenceSession("model.onnx")
# Helper function to generate text using the ONNX model
def generate_with_onnx(prompt_ids, max_new_tokens=50, temperature=1.0):
input_ids = prompt_ids.clone()
for _ in range(max_new_tokens):
# Get the last block_size tokens if input is too long
if input_ids.shape[1] > model.block_size:
input_ids = input_ids[:, -model.block_size:]
# Run inference
ort_inputs = {
'input': input_ids.cpu().numpy()
}
logits = ort_session.run(None, ort_inputs)[0]
# Get predictions for the next token
logits = torch.from_numpy(logits)
logits = logits[:, -1, :] # Only take the last token's predictions
# Apply temperature
if temperature != 1.0:
logits = logits / temperature
# Sample from the distribution
probs = torch.nn.functional.softmax(logits, dim=-1)
next_token = torch.multinomial(probs, num_samples=1)
# Append the new token
input_ids = torch.cat([input_ids, next_token], dim=1)
return input_ids
# Test the generation
prompt = "Hello"
prompt_ids = tok(prompt, return_tensors="pt")["input_ids"]
generated_ids = generate_with_onnx(prompt_ids)
generated_text = tok.decode(generated_ids[0], skip_special_tokens=True)
print(f"Generated text: {generated_text}")
#Generated text: Hello there! How can I assist you today? I'm a helpful AI assistant trained to provide information and answer questions on a wide range of topics.
```
### Android Usage
Coming Soon!
### iOS Usage
Coming Soon!
## 📜 License
📍 **CC-BY-NC-4.0**: Free for non-commercial use.
💼 **Commercial Use**: Contact IJK Technology Ltd for licensing at [[email protected]](mailto:[email protected]).
## 🛠️ About IJK Technology Ltd
IJK Technology Ltd (IJKTech) develops innovative machine learning models optimized for on-device inference. Our focus is on efficiency, privacy, and usability across mobile and embedded platforms.