Commit
·
bf2a0e5
1
Parent(s):
6a290a7
Update README.md
Browse files
README.md
CHANGED
@@ -30,16 +30,16 @@ pipeline_tag: text-generation
|
|
30 |
|
31 |
|
32 |
## Overview
|
33 |
-
Taiwan-LLaMa is a full parameter fine-tuned model based on LLaMa 2 for
|
34 |
|
35 |
-
**Taiwan-LLaMa v1.0** pretrained on over 5 billion tokens and instruction-tuned on over 490k conversations both in traditional
|
36 |
|
37 |
## Demo
|
38 |
A live demonstration of the model can be accessed at [Hugging Face Spaces](https://huggingface.co/spaces/yentinglin/Taiwan-LLaMa2).
|
39 |
|
40 |
## Key Features
|
41 |
|
42 |
-
1. **Traditional
|
43 |
|
44 |
2. **Instruction-Tuned**: Further fine-tuned on conversational data to offer context-aware and instruction-following responses.
|
45 |
|
@@ -49,8 +49,8 @@ A live demonstration of the model can be accessed at [Hugging Face Spaces](https
|
|
49 |
|
50 |
|
51 |
## Work in progress
|
52 |
-
- [ ] **Improved
|
53 |
-
- [ ] **
|
54 |
|
55 |
|
56 |
## Taiwanese Culture Examples
|
@@ -72,7 +72,7 @@ We provide a number of model checkpoints that we trained. Please find them on Hu
|
|
72 |
|--------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|
|
73 |
| **Taiwan-LLaMa v1.0** (_better for Taiwanese Culture_) | 🤗 <a href="https://huggingface.co/yentinglin/Taiwan-LLaMa-v1.0" target="_blank">yentinglin/Taiwan-LLaMa-v1.0</a> |
|
74 |
| Taiwan-LLaMa v0.9 (partial instruction set) | 🤗 <a href="https://huggingface.co/yentinglin/Taiwan-LLaMa-v0.9" target="_blank">yentinglin/Taiwan-LLaMa-v0.9</a> |
|
75 |
-
| Taiwan-LLaMa v0.0 (no Traditional
|
76 |
|
77 |
## Data
|
78 |
|
@@ -80,8 +80,8 @@ Here are some quick links to the datasets that we used to train the models:
|
|
80 |
|
81 |
| **Dataset** | **Link** |
|
82 |
|---------------------------------|-------------------------------------------------------------------------------------------------------------------------------|
|
83 |
-
| **Instruction-tuning** | 🤗 <a href="https://huggingface.co/datasets/yentinglin/
|
84 |
-
| Traditional
|
85 |
|
86 |
|
87 |
## Architecture
|
@@ -89,12 +89,12 @@ Taiwan-LLaMa is based on LLaMa 2, leveraging transformer architecture, <a href="
|
|
89 |
|
90 |
It includes:
|
91 |
|
92 |
-
* Pretraining Phase: Pretrained on a vast corpus of over 5 billion tokens, extracted from common crawl in Traditional
|
93 |
* Fine-tuning Phase: Further instruction-tuned on over 490k multi-turn conversational data to enable more instruction-following and context-aware responses.
|
94 |
|
95 |
## Generic Capabilities on Vicuna Benchmark
|
96 |
|
97 |
-
The data is translated into traditional
|
98 |
|
99 |
|
100 |
<img src="./images/zhtw_vicuna_bench_chatgptbaseline.png" width="700">
|
@@ -157,7 +157,7 @@ If you use our code, data, or models in your research, please cite this reposito
|
|
157 |
```
|
158 |
|
159 |
## Collaborate With Us
|
160 |
-
If you are interested in contributing to the development of Traditional
|
161 |
|
162 |
## License
|
163 |
The code in this project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.
|
|
|
30 |
|
31 |
|
32 |
## Overview
|
33 |
+
Taiwan-LLaMa is a full parameter fine-tuned model based on LLaMa 2 for Traditional Mandarin applications.
|
34 |
|
35 |
+
**Taiwan-LLaMa v1.0** pretrained on over 5 billion tokens and instruction-tuned on over 490k conversations both in traditional mandarin.
|
36 |
|
37 |
## Demo
|
38 |
A live demonstration of the model can be accessed at [Hugging Face Spaces](https://huggingface.co/spaces/yentinglin/Taiwan-LLaMa2).
|
39 |
|
40 |
## Key Features
|
41 |
|
42 |
+
1. **Traditional Mandarin Support**: The model is fine-tuned to understand and generate text in Traditional Mandarin, making it suitable for Taiwanese culture and related applications.
|
43 |
|
44 |
2. **Instruction-Tuned**: Further fine-tuned on conversational data to offer context-aware and instruction-following responses.
|
45 |
|
|
|
49 |
|
50 |
|
51 |
## Work in progress
|
52 |
+
- [ ] **Improved pretraining**: A refined pretraining process (e.g. more data from Taiwan, training strategies) is under development, aiming to enhance model performance for better Taiwanese culture.
|
53 |
+
- [ ] **Extend max length**: Utilizing the Rope mechanism as described in [the paper](https://arxiv.org/abs/2104.09864), the model's length will be extended from 4k to 8k.
|
54 |
|
55 |
|
56 |
## Taiwanese Culture Examples
|
|
|
72 |
|--------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|
|
73 |
| **Taiwan-LLaMa v1.0** (_better for Taiwanese Culture_) | 🤗 <a href="https://huggingface.co/yentinglin/Taiwan-LLaMa-v1.0" target="_blank">yentinglin/Taiwan-LLaMa-v1.0</a> |
|
74 |
| Taiwan-LLaMa v0.9 (partial instruction set) | 🤗 <a href="https://huggingface.co/yentinglin/Taiwan-LLaMa-v0.9" target="_blank">yentinglin/Taiwan-LLaMa-v0.9</a> |
|
75 |
+
| Taiwan-LLaMa v0.0 (no Traditional Mandarin pretraining) | 🤗 <a href="https://huggingface.co/yentinglin/Taiwan-LLaMa-v0.0" target="_blank">yentinglin/Taiwan-LLaMa-v0.0</a> |
|
76 |
|
77 |
## Data
|
78 |
|
|
|
80 |
|
81 |
| **Dataset** | **Link** |
|
82 |
|---------------------------------|-------------------------------------------------------------------------------------------------------------------------------|
|
83 |
+
| **Instruction-tuning** | 🤗 <a href="https://huggingface.co/datasets/yentinglin/traditional_mandarin_instructions" target="_blank">yentinglin/traditional_mandarin_instructions</a> |
|
84 |
+
| Traditional Mandarin Pretraining | 🤗 <a href="https://huggingface.co/datasets/yentinglin/zh_TW_c4" target="_blank">yentinglin/zh_TW_c4</a> |
|
85 |
|
86 |
|
87 |
## Architecture
|
|
|
89 |
|
90 |
It includes:
|
91 |
|
92 |
+
* Pretraining Phase: Pretrained on a vast corpus of over 5 billion tokens, extracted from common crawl in Traditional Mandarin.
|
93 |
* Fine-tuning Phase: Further instruction-tuned on over 490k multi-turn conversational data to enable more instruction-following and context-aware responses.
|
94 |
|
95 |
## Generic Capabilities on Vicuna Benchmark
|
96 |
|
97 |
+
The data is translated into traditional mandarin for evaluating the general capability.
|
98 |
|
99 |
|
100 |
<img src="./images/zhtw_vicuna_bench_chatgptbaseline.png" width="700">
|
|
|
157 |
```
|
158 |
|
159 |
## Collaborate With Us
|
160 |
+
If you are interested in contributing to the development of Traditional Mandarin language models, exploring new applications, or leveraging Taiwan-LLaMa for your specific needs, please don't hesitate to contact us. We welcome collaborations from academia, industry, and individual contributors.
|
161 |
|
162 |
## License
|
163 |
The code in this project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.
|