|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- sudy-super/JetCopper-10B |
|
language: |
|
- ja |
|
- en |
|
tags: |
|
- japanese |
|
- causal-lm |
|
inference: false |
|
--- |
|
<img src="./contrail_logo.svg" width="100%" height="10%" alt=""> |
|
|
|
*Logo designed by [Rotejin](https://x.com/rotejin).* |
|
|
|
|
|
# Contrail-200m-64k |
|
|
|
## Description |
|
|
|
Contrail is Mistral model pre-trained on the 10b tokens of [JetCopper-10B](https://huggingface.co/datasets/sudy-super/JetCopper-10B). |
|
|
|
A final validation perplexity of 27.88 has been reached. |
|
|
|
## Model Details |
|
|
|
- **Architecture**: Mistral (LLaMA-compatible) |
|
|
|
- **Model size**: 200M |
|
|
|
- **Trained tokens**: 10B tokens |
|
|
|
- **Context length**: 65536 |
|
|
|
- **Languages**: Japanese, English |
|
|
|
- **License**: Apache-2.0 |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer |
|
import torch |
|
|
|
model_name = "sudy-super/Contrail-200m-64k" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.bfloat16) |
|
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True) |
|
|
|
if torch.cuda.is_available(): |
|
model = model.to("cuda") |
|
|
|
prompt = "AIによって私達の暮らしは、" |
|
|
|
with torch.no_grad(): |
|
token_ids = tokenizer.encode(prompt, return_tensors="pt") |
|
output_ids = model.generate( |
|
input_ids=token_ids.to(model.device), |
|
min_new_tokens=10, |
|
max_new_tokens=100, |
|
do_sample=True, |
|
temperature=0.7, |
|
streamer=streamer, |
|
) |
|
``` |
|
|
|
## Author |
|
|
|
[Rakuto Suda](https://huggingface.co/sudy-super) |
|
|
|
## Citations |
|
|
|
``` |
|
@article{jiang2023mistral}, |
|
title={Mistral 7B}, |
|
author={Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, L{\'e}lio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timoth{\'e}e Lacroix, William El Sayed}, |
|
journal={arXiv preprint arXiv:2310.06825}, |
|
year={2023} |
|
} |
|
``` |