|
--- |
|
library_name: transformers |
|
license: llama3 |
|
datasets: |
|
- VTSNLP/vietnamese_curated_dataset |
|
language: |
|
- vi |
|
- en |
|
base_model: |
|
- meta-llama/Meta-Llama-3-8B |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# Model Information |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
|
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
Llama3-ViettelSolutions-8B is a variant of the Meta Llama-3-8B model, continued pre-trained on the [Vietnamese curated dataset](https://huggingface.co/datasets/VTSNLP/vietnamese_curated_dataset) and supervised fine-tuned on 5 million samples of Vietnamese instruct data. |
|
- **Developed by:** Viettel Solutions |
|
- **Funded by:** NVIDIA |
|
- **Model type:** Autoregressive transformer model |
|
- **Language(s) (NLP):** Vietnamese, English |
|
- **License:** Llama 3 Community License |
|
- **Finetuned from model:** meta-llama/Meta-Llama-3-8B |
|
|
|
## Uses |
|
|
|
Example snippet for usage with Transformers: |
|
|
|
``` |
|
import transformers |
|
import torch |
|
|
|
model_id = "VTSNLP/Llama3-ViettelSolutions-8B" |
|
|
|
pipeline = transformers.pipeline( |
|
"text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto" |
|
) |
|
pipeline("Xin chào!") |
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
- Dataset for continue pretrain: [Vietnamese curated dataset](https://huggingface.co/datasets/VTSNLP/vietnamese_curated_dataset) |
|
|
|
- Dataset for supervised fine-tuning: [Instruct general dataset](https://huggingface.co/datasets/VTSNLP/instruct_general_dataset) |
|
|
|
|
|
### Training Procedure |
|
|
|
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> |
|
|
|
#### Preprocessing |
|
|
|
[More Information Needed] |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
- **Training regime:** bf16 mixed precision <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision --> |
|
- **Data sequence length:** 8192 |
|
- **Tensor model parallel size:** 4 |
|
- **Pipelinemodel parallel size:** 1 |
|
- **Context parallel size:** 1 |
|
- **Micro batch size:** 1 |
|
- **Global batch size:** 512 |
|
|
|
## Evaluation |
|
|
|
<!-- This section describes the evaluation protocols and provides the results. --> |
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
#### Testing Data |
|
|
|
<!-- This should link to a Dataset Card if possible. --> |
|
|
|
[More Information Needed] |
|
|
|
#### Factors |
|
|
|
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. --> |
|
|
|
[More Information Needed] |
|
|
|
#### Metrics |
|
|
|
<!-- These are the evaluation metrics being used, ideally with a description of why. --> |
|
|
|
[More Information Needed] |
|
|
|
### Results |
|
|
|
[More Information Needed] |
|
|
|
#### Summary |
|
|
|
[More Information Needed] |
|
|
|
## Technical Specifications |
|
|
|
- Compute Infrastructure: NVIDIA DGX |
|
|
|
- Hardware: 4 x A100 80GB |
|
|
|
- Software: [NeMo Framework](https://github.com/NVIDIA/NeMo) |
|
|
|
## Citation |
|
|
|
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
|
|
|
**BibTeX:** |
|
|
|
[More Information Needed] |
|
|
|
**APA:** |
|
|
|
[More Information Needed] |
|
|
|
## More Information |
|
|
|
[More Information Needed] |
|
|
|
## Model Card Authors |
|
|
|
[More Information Needed] |
|
|
|
## Model Card Contact |
|
|
|
[More Information Needed] |