|
--- |
|
library_name: transformers |
|
tags: |
|
- gpt |
|
- distillation |
|
- mobile |
|
- embedded |
|
- onnx |
|
license: cc-by-nc-4.0 |
|
datasets: |
|
- custom |
|
- web |
|
language: en |
|
widget: |
|
- text: "In order to make pancakes, you need to" |
|
- text: "Once upon a time" |
|
--- |
|
|
|
<p align="center"> |
|
<img src="logo.png" alt="IJK Technology" width="150"> |
|
</p> |
|
|
|
<h1 align="center">IJK Technology – ByteGPT-r1</h1> |
|
|
|
|
|
**ByteGPT-r1** is a distilled version of DeepSeek's QWEN 1.5B model, optimized specifically for mobile and edge computing environments. It maintains impressive language capabilities while being designed for compute- and memory-constrained devices. |
|
|
|
## 🚀 Overview |
|
- **Model Type:** Distilled GPT-style causal language model |
|
- **Base Model:** DeepSeek's QWEN 1.5B |
|
- **Intended Use:** Edge devices, mobile phones, embedded systems |
|
- **Size:** Optimized for mobile deployment |
|
- **Training:** Knowledge distillation from QWEN 1.5B |
|
|
|
## 🧠 Why ByteGPT-r1? |
|
ByteGPT-r1 offers several advantages for mobile and edge deployment: |
|
|
|
1. **Efficient Knowledge Distillation:** |
|
Carefully distilled from DeepSeek's QWEN 1.5B model to preserve capabilities while reducing computational requirements. |
|
|
|
2. **Mobile-First Design:** |
|
Architected specifically for the constraints of mobile devices, with optimizations for both inference speed and memory usage. |
|
|
|
3. **Balanced Performance:** |
|
Maintains a good balance between model size and language generation capabilities, making it practical for real-world mobile applications. |
|
|
|
## 💡 Future Plans |
|
This model is part of our ongoing effort to bring powerful language models to edge devices. Upcoming releases will include: |
|
|
|
- **Specialized Variants:** Domain-specific versions optimized for particular use cases |
|
- **Further Optimizations:** Continued improvements in efficiency and performance |
|
- **Benchmark Results:** Comparative performance on various mobile devices |
|
- **Integration Examples:** More code samples for popular mobile frameworks |
|
|
|
## 💻 Usage |
|
|
|
### **Quick Start (with `transformers`):** |
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model = AutoModelForCausalLM.from_pretrained("ijktech/ByteGPT-r1", trust_remote_code=True) |
|
tokenizer = AutoTokenizer.from_pretrained("ijktech/ByteGPT-r1") |
|
|
|
input_text = "What is the capital of France?" |
|
inputs = tokenizer(input_text, return_tensors="pt") |
|
outputs = model.generate(**inputs, max_new_tokens=100) |
|
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
``` |
|
|
|
### Tokenizer |
|
|
|
The tokenizer is compatible with AutoTokenizer from Hugging Face: |
|
|
|
```python |
|
tokenizer = AutoTokenizer.from_pretrained("ijktech/ByteGPT-r1") |
|
``` |
|
|
|
### ONNX |
|
|
|
The model is also available in ONNX format, and can be used with the ONNX Runtime: |
|
|
|
```python |
|
import onnxruntime as ort |
|
import numpy as np |
|
|
|
# Create ONNX Runtime session |
|
ort_session = ort.InferenceSession("model.onnx") |
|
|
|
# Helper function to generate text using the ONNX model |
|
def generate_with_onnx(prompt_ids, max_new_tokens=50, temperature=1.0): |
|
input_ids = prompt_ids.clone() |
|
|
|
for _ in range(max_new_tokens): |
|
# Get the last block_size tokens if input is too long |
|
if input_ids.shape[1] > model.block_size: |
|
input_ids = input_ids[:, -model.block_size:] |
|
|
|
# Run inference |
|
ort_inputs = { |
|
'input': input_ids.cpu().numpy() |
|
} |
|
logits = ort_session.run(None, ort_inputs)[0] |
|
|
|
# Get predictions for the next token |
|
logits = torch.from_numpy(logits) |
|
logits = logits[:, -1, :] # Only take the last token's predictions |
|
|
|
# Apply temperature |
|
if temperature != 1.0: |
|
logits = logits / temperature |
|
|
|
# Sample from the distribution |
|
probs = torch.nn.functional.softmax(logits, dim=-1) |
|
next_token = torch.multinomial(probs, num_samples=1) |
|
|
|
# Append the new token |
|
input_ids = torch.cat([input_ids, next_token], dim=1) |
|
|
|
return input_ids |
|
|
|
# Test the generation |
|
prompt = "Hello" |
|
prompt_ids = tok(prompt, return_tensors="pt")["input_ids"] |
|
generated_ids = generate_with_onnx(prompt_ids) |
|
generated_text = tok.decode(generated_ids[0], skip_special_tokens=True) |
|
print(f"Generated text: {generated_text}") |
|
#Generated text: Hello there! How can I assist you today? I'm a helpful AI assistant trained to provide information and answer questions on a wide range of topics. |
|
``` |
|
|
|
### Android Usage |
|
|
|
Coming Soon! |
|
|
|
|
|
### iOS Usage |
|
|
|
Coming Soon! |
|
|
|
|
|
## 📜 License |
|
📍 **CC-BY-NC-4.0**: Free for non-commercial use. |
|
|
|
💼 **Commercial Use**: Contact IJK Technology Ltd for licensing at [[email protected]](mailto:[email protected]). |
|
|
|
## 🛠️ About IJK Technology Ltd |
|
IJK Technology Ltd (IJKTech) develops innovative machine learning models optimized for on-device inference. Our focus is on efficiency, privacy, and usability across mobile and embedded platforms. |