|
--- |
|
license: llama3.1 |
|
language: |
|
- en |
|
- de |
|
- fr |
|
- it |
|
- pt |
|
- hi |
|
- es |
|
- th |
|
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct |
|
pipeline_tag: text-generation |
|
tags: |
|
- facebook |
|
- meta |
|
- pytorch |
|
- llama |
|
- llama-3 |
|
- mergekit |
|
--- |
|
|
|
This model is "Built with Llama". |
|
|
|
It is based on [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) and |
|
was created with the help of [mergekit](https://github.com/arcee-ai/mergekit). |
|
This is the mergekit configuration we used: [mergekit_moe_config.yml](https://huggingface.co/deutsche-telekom/Llama-3.1-MoE-8x8B-Instruct-raw/blob/main/mergekit_moe_config.yml) |
|
|
|
It should be noted that this model is the raw model after merging. |
|
It still has randomly initialized router networks and will not be better than a single one of its expert models. |
|
This model requires further training before use. |
|
|
|
This model has a total of 47.5B params, which is slightly more than the [Mixtral 8x7b](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) |
|
with its 46.7B params. |
|
|
|
## Licensing |
|
|
|
This model is licensed under the Llama 3.1 Community License, Copyright (c) 2024 [Philip May](https://philipmay.org), [Deutsche Telekom AG](https://www.telekom.de/)\ |
|
Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved. |
|
|