VTSNLP's picture
Update README.md
9b64a07 verified
---
library_name: transformers
license: llama3
datasets:
- VTSNLP/vietnamese_curated_dataset
language:
- vi
- en
base_model:
- meta-llama/Meta-Llama-3-8B
pipeline_tag: text-generation
---
# Model Information
<!-- Provide a quick summary of what the model is/does. -->
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
Llama3-ViettelSolutions-8B is a variant of the Meta Llama-3-8B model, continued pre-trained on the [Vietnamese curated dataset](https://huggingface.co/datasets/VTSNLP/vietnamese_curated_dataset) and supervised fine-tuned on 5 million samples of Vietnamese instruct data.
- **Developed by:** Viettel Solutions
- **Funded by:** NVIDIA
- **Model type:** Autoregressive transformer model
- **Language(s) (NLP):** Vietnamese, English
- **License:** Llama 3 Community License
- **Finetuned from model:** meta-llama/Meta-Llama-3-8B
## Uses
Example snippet for usage with Transformers:
```
import transformers
import torch
model_id = "VTSNLP/Llama3-ViettelSolutions-8B"
pipeline = transformers.pipeline(
"text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto"
)
pipeline("Xin chào!")
```
## Training Details
### Training Data
- Dataset for continue pretrain: [Vietnamese curated dataset](https://huggingface.co/datasets/VTSNLP/vietnamese_curated_dataset)
- Dataset for supervised fine-tuning: [Instruct general dataset](https://huggingface.co/datasets/VTSNLP/instruct_general_dataset)
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
#### Preprocessing
[More Information Needed]
#### Training Hyperparameters
- **Training regime:** bf16 mixed precision <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
- **Data sequence length:** 8192
- **Tensor model parallel size:** 4
- **Pipelinemodel parallel size:** 1
- **Context parallel size:** 1
- **Micro batch size:** 1
- **Global batch size:** 512
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
### Testing Data, Factors & Metrics
#### Testing Data
<!-- This should link to a Dataset Card if possible. -->
[More Information Needed]
#### Factors
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
[More Information Needed]
#### Metrics
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
[More Information Needed]
### Results
[More Information Needed]
#### Summary
[More Information Needed]
## Technical Specifications
- Compute Infrastructure: NVIDIA DGX
- Hardware: 4 x A100 80GB
- Software: [NeMo Framework](https://github.com/NVIDIA/NeMo)
## Citation
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
**BibTeX:**
[More Information Needed]
**APA:**
[More Information Needed]
## More Information
[More Information Needed]
## Model Card Authors
[More Information Needed]
## Model Card Contact
[More Information Needed]