VTSNLP
/

Llama3-ViettelSolutions-8B

Text Generation

text-generation-inference

Model card Files Files and versions Community

Llama3-ViettelSolutions-8B / README.md

VTSNLP's picture

Update README.md

9b64a07 verified 5 months ago

|

history blame contribute delete

3.25 kB

	---
	library_name: transformers
	license: llama3
	datasets:
	- VTSNLP/vietnamese_curated_dataset
	language:
	- vi
	- en
	base_model:
	- meta-llama/Meta-Llama-3-8B
	pipeline_tag: text-generation
	---

	# Model Information

	<!-- Provide a quick summary of what the model is/does. -->



	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	Llama3-ViettelSolutions-8B is a variant of the Meta Llama-3-8B model, continued pre-trained on the [Vietnamese curated dataset](https://huggingface.co/datasets/VTSNLP/vietnamese_curated_dataset) and supervised fine-tuned on 5 million samples of Vietnamese instruct data.
	- Developed by: Viettel Solutions
	- Funded by: NVIDIA
	- Model type: Autoregressive transformer model
	- Language(s) (NLP): Vietnamese, English
	- License: Llama 3 Community License
	- Finetuned from model: meta-llama/Meta-Llama-3-8B

	## Uses

	Example snippet for usage with Transformers:

	```
	import transformers
	import torch

	model_id = "VTSNLP/Llama3-ViettelSolutions-8B"

	pipeline = transformers.pipeline(
	"text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto"
	)
	pipeline("Xin chào!")
	```


	## Training Details

	### Training Data

	- Dataset for continue pretrain: [Vietnamese curated dataset](https://huggingface.co/datasets/VTSNLP/vietnamese_curated_dataset)

	- Dataset for supervised fine-tuning: [Instruct general dataset](https://huggingface.co/datasets/VTSNLP/instruct_general_dataset)


	### Training Procedure

	<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

	#### Preprocessing

	[More Information Needed]


	#### Training Hyperparameters

	- Training regime: bf16 mixed precision <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
	- Data sequence length: 8192
	- Tensor model parallel size: 4
	- Pipelinemodel parallel size: 1
	- Context parallel size: 1
	- Micro batch size: 1
	- Global batch size: 512

	## Evaluation

	<!-- This section describes the evaluation protocols and provides the results. -->

	### Testing Data, Factors & Metrics

	#### Testing Data

	<!-- This should link to a Dataset Card if possible. -->

	[More Information Needed]

	#### Factors

	<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->

	[More Information Needed]

	#### Metrics

	<!-- These are the evaluation metrics being used, ideally with a description of why. -->

	[More Information Needed]

	### Results

	[More Information Needed]

	#### Summary

	[More Information Needed]

	## Technical Specifications

	- Compute Infrastructure: NVIDIA DGX

	- Hardware: 4 x A100 80GB

	- Software: [NeMo Framework](https://github.com/NVIDIA/NeMo)

	## Citation

	<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

	BibTeX:

	[More Information Needed]

	APA:

	[More Information Needed]

	## More Information

	[More Information Needed]

	## Model Card Authors

	[More Information Needed]

	## Model Card Contact

	[More Information Needed]