mkshing commited on
Commit
e250fea
Β·
verified Β·
1 Parent(s): c762803

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -7,11 +7,11 @@ library_name: transformers
7
  base_model:
8
  - Qwen/Qwen2.5-1.5B-Instruct
9
  ---
10
- # Smol-Swallow-1.5B
11
 
12
  πŸ€— [Models](https://huggingface.co/SakanaAI) | πŸ“š [Paper](https://arxiv.org/abs/TODO) | πŸ“ [Blog](https://sakana.ai/taid/) | 🐦 [Twitter](https://twitter.com/SakanaAILabs)
13
 
14
- **Smol-Swallow-1.5B** is a Japanese compact language model created through TAID (Temporally Adaptive Interpolated Distillation), our new knowledge distillation method.
15
  We used [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) as the teacher model and [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) as the student model.
16
  The model has been further pre-trained on Japanese text data to enhance its Japanese language capabilities.
17
 
 
7
  base_model:
8
  - Qwen/Qwen2.5-1.5B-Instruct
9
  ---
10
+ # SmolSwallow-1.5B
11
 
12
  πŸ€— [Models](https://huggingface.co/SakanaAI) | πŸ“š [Paper](https://arxiv.org/abs/TODO) | πŸ“ [Blog](https://sakana.ai/taid/) | 🐦 [Twitter](https://twitter.com/SakanaAILabs)
13
 
14
+ **SmolSwallow-1.5B** is a Japanese compact language model created through TAID (Temporally Adaptive Interpolated Distillation), our new knowledge distillation method.
15
  We used [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) as the teacher model and [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) as the student model.
16
  The model has been further pre-trained on Japanese text data to enhance its Japanese language capabilities.
17