File size: 4,921 Bytes
5d1a6cc d5a1ab0 5d1a6cc d5a1ab0 5d1a6cc d5a1ab0 5d1a6cc d5a1ab0 5d1a6cc d5a1ab0 5d1a6cc d5a1ab0 5d1a6cc d5a1ab0 5d1a6cc d5a1ab0 5d1a6cc d5a1ab0 5d1a6cc d5a1ab0 5d1a6cc d5a1ab0 5d1a6cc d5a1ab0 5d1a6cc d5a1ab0 5d1a6cc d5a1ab0 5d1a6cc d5a1ab0 5d1a6cc d5a1ab0 5d1a6cc d5a1ab0 5d1a6cc d5a1ab0 5d1a6cc d5a1ab0 5d1a6cc d5a1ab0 5d1a6cc d5a1ab0 5d1a6cc d5a1ab0 5d1a6cc d5a1ab0 5d1a6cc d5a1ab0 5d1a6cc d5a1ab0 5d1a6cc d5a1ab0 5d1a6cc d5a1ab0 5d1a6cc d5a1ab0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
---
library_name: transformers
tags:
- gpt
- distillation
- mobile
- embedded
- onnx
license: cc-by-nc-4.0
datasets:
- custom
- web
language: en
widget:
- text: "In order to make pancakes, you need to"
- text: "Once upon a time"
---
<p align="center">
<img src="logo.png" alt="IJK Technology" width="150">
</p>
<h1 align="center">IJK Technology – ByteGPT-r1</h1>
**ByteGPT-r1** is a distilled version of DeepSeek's QWEN 1.5B model, optimized specifically for mobile and edge computing environments. It maintains impressive language capabilities while being designed for compute- and memory-constrained devices.
## 🚀 Overview
- **Model Type:** Distilled GPT-style causal language model
- **Base Model:** DeepSeek's QWEN 1.5B
- **Intended Use:** Edge devices, mobile phones, embedded systems
- **Size:** Optimized for mobile deployment
- **Training:** Knowledge distillation from QWEN 1.5B
## 🧠 Why ByteGPT-r1?
ByteGPT-r1 offers several advantages for mobile and edge deployment:
1. **Efficient Knowledge Distillation:**
Carefully distilled from DeepSeek's QWEN 1.5B model to preserve capabilities while reducing computational requirements.
2. **Mobile-First Design:**
Architected specifically for the constraints of mobile devices, with optimizations for both inference speed and memory usage.
3. **Balanced Performance:**
Maintains a good balance between model size and language generation capabilities, making it practical for real-world mobile applications.
## 💡 Future Plans
This model is part of our ongoing effort to bring powerful language models to edge devices. Upcoming releases will include:
- **Specialized Variants:** Domain-specific versions optimized for particular use cases
- **Further Optimizations:** Continued improvements in efficiency and performance
- **Benchmark Results:** Comparative performance on various mobile devices
- **Integration Examples:** More code samples for popular mobile frameworks
## 💻 Usage
### **Quick Start (with `transformers`):**
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("ijktech/ByteGPT-r1", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("ijktech/ByteGPT-r1")
input_text = "What is the capital of France?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### Tokenizer
The tokenizer is compatible with AutoTokenizer from Hugging Face:
```python
tokenizer = AutoTokenizer.from_pretrained("ijktech/ByteGPT-r1")
```
### ONNX
The model is also available in ONNX format, and can be used with the ONNX Runtime:
```python
import onnxruntime as ort
import numpy as np
# Create ONNX Runtime session
ort_session = ort.InferenceSession("model.onnx")
# Helper function to generate text using the ONNX model
def generate_with_onnx(prompt_ids, max_new_tokens=50, temperature=1.0):
input_ids = prompt_ids.clone()
for _ in range(max_new_tokens):
# Get the last block_size tokens if input is too long
if input_ids.shape[1] > model.block_size:
input_ids = input_ids[:, -model.block_size:]
# Run inference
ort_inputs = {
'input': input_ids.cpu().numpy()
}
logits = ort_session.run(None, ort_inputs)[0]
# Get predictions for the next token
logits = torch.from_numpy(logits)
logits = logits[:, -1, :] # Only take the last token's predictions
# Apply temperature
if temperature != 1.0:
logits = logits / temperature
# Sample from the distribution
probs = torch.nn.functional.softmax(logits, dim=-1)
next_token = torch.multinomial(probs, num_samples=1)
# Append the new token
input_ids = torch.cat([input_ids, next_token], dim=1)
return input_ids
# Test the generation
prompt = "Hello"
prompt_ids = tok(prompt, return_tensors="pt")["input_ids"]
generated_ids = generate_with_onnx(prompt_ids)
generated_text = tok.decode(generated_ids[0], skip_special_tokens=True)
print(f"Generated text: {generated_text}")
#Generated text: Hello there! How can I assist you today? I'm a helpful AI assistant trained to provide information and answer questions on a wide range of topics.
```
### Android Usage
Coming Soon!
### iOS Usage
Coming Soon!
## 📜 License
📍 **CC-BY-NC-4.0**: Free for non-commercial use.
💼 **Commercial Use**: Contact IJK Technology Ltd for licensing at [[email protected]](mailto:[email protected]).
## 🛠️ About IJK Technology Ltd
IJK Technology Ltd (IJKTech) develops innovative machine learning models optimized for on-device inference. Our focus is on efficiency, privacy, and usability across mobile and embedded platforms. |