File size: 1,069 Bytes
8ddf596 97f00ee ee1b4c2 97f00ee 8ddf596 97f00ee 8ddf596 ee1b4c2 8ddf596 ee1b4c2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
---
library_name: transformers
license: apache-2.0
datasets:
- liswei/zhtw-news-and-articles-2B
base_model: apple/OpenELM-270M
language:
- zh
---
# Model Card for Chinese-OpenELM-270M
Continual pre-trained from [apple/OpenELM-270M](https://huggingface.co/apple/OpenELM-270M) with [liswei/zhtw-news-and-articles-2B](https://huggingface.co/datasets/liswei/zhtw-news-and-articles-2B):
* Extended vocabulary from 32000 to 61758 tokens with additional Traditional Chinese characters.
* Tokenizer is trained on [liswei/zhtw-news-and-articles-2B](https://huggingface.co/datasets/liswei/zhtw-news-and-articles-2B) and pruned from 96000 to 61758 tokens while maintaining 95% coverage on the pre-training dataset.
* Additional token embeddings are initialized with the mean vector of existing embeddings.
* Traditional Chinese perplexity = 1.6871 on held-out evaluation dataset.
* Applied [GaLore](https://arxiv.org/abs/2403.03507) for efficient training with following hyperparameters:
* Rank: 1024
* Scale: 4.0
* Update interval: 200
* Layer-wise training: False |