Update README.md
Browse files
README.md
CHANGED
@@ -7,11 +7,11 @@ library_name: transformers
|
|
7 |
base_model:
|
8 |
- Qwen/Qwen2.5-1.5B-Instruct
|
9 |
---
|
10 |
-
#
|
11 |
|
12 |
π€ [Models](https://huggingface.co/SakanaAI) | π [Paper](https://arxiv.org/abs/TODO) | π [Blog](https://sakana.ai/taid/) | π¦ [Twitter](https://twitter.com/SakanaAILabs)
|
13 |
|
14 |
-
**
|
15 |
We used [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) as the teacher model and [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) as the student model.
|
16 |
The model has been further pre-trained on Japanese text data to enhance its Japanese language capabilities.
|
17 |
|
|
|
7 |
base_model:
|
8 |
- Qwen/Qwen2.5-1.5B-Instruct
|
9 |
---
|
10 |
+
# SmolSwallow-1.5B
|
11 |
|
12 |
π€ [Models](https://huggingface.co/SakanaAI) | π [Paper](https://arxiv.org/abs/TODO) | π [Blog](https://sakana.ai/taid/) | π¦ [Twitter](https://twitter.com/SakanaAILabs)
|
13 |
|
14 |
+
**SmolSwallow-1.5B** is a Japanese compact language model created through TAID (Temporally Adaptive Interpolated Distillation), our new knowledge distillation method.
|
15 |
We used [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) as the teacher model and [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) as the student model.
|
16 |
The model has been further pre-trained on Japanese text data to enhance its Japanese language capabilities.
|
17 |
|