deutsche-telekom
/

Llama-3.1-MoE-8x8B-Instruct-raw

Text Generation

🇪🇺 Region: EU

Model card Files Files and versions Community

Llama-3.1-MoE-8x8B-Instruct-raw / README.md

PhilipMay's picture

Update README.md

ada824b verified 6 months ago

|

history blame contribute delete

1.33 kB

	---
	license: llama3.1
	language:
	- en
	- de
	- fr
	- it
	- pt
	- hi
	- es
	- th
	base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
	pipeline_tag: text-generation
	tags:
	- facebook
	- meta
	- pytorch
	- llama
	- llama-3
	- mergekit
	---

	This model is "Built with Llama".

	It is based on [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) and
	was created with the help of [mergekit](https://github.com/arcee-ai/mergekit).
	This is the mergekit configuration we used: [mergekit_moe_config.yml](https://huggingface.co/deutsche-telekom/Llama-3.1-MoE-8x8B-Instruct-raw/blob/main/mergekit_moe_config.yml)

	It should be noted that this model is the raw model after merging.
	It still has randomly initialized router networks and will not be better than a single one of its expert models.
	This model requires further training before use.

	This model has a total of 47.5B params, which is slightly more than the [Mixtral 8x7b](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
	with its 46.7B params.

	## Licensing

	This model is licensed under the Llama 3.1 Community License, Copyright (c) 2024 [Philip May](https://philipmay.org), [Deutsche Telekom AG](https://www.telekom.de/)\
	Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.