LLaMA Chinese 81M

一個小型中英文(雙語)預訓練語言模型。

Training Dataset

  • 中文維基百科(20230601)
  • 英文維基百科(20230601)

Tokenizer

使用重新在中英文語料上訓練的 BPE Tokenizer,擁有較佳的分詞效果與邊解碼效率。

https://github.com/p208p2002/BPE-tokenizer-from-zh-wiki

Downloads last month
53
Safetensors
Model size
81M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Dataset used to train p208p2002/llama-chinese-81M

Collection including p208p2002/llama-chinese-81M