keras
/

mixtral_8_instruct_7b_en

KerasHub

Model card Files Files and versions Community

Divyasreepat commited on Jun 17

Commit

8bc318a

verified ·

1 Parent(s): 9b0ce6c

Update README.md with new model card content

Browse files

Files changed (1) hide show

README.md +206 -0

README.md ADDED Viewed

	@@ -0,0 +1,206 @@

+---
+library_name: keras-hub
+---
+### Model Overview
+# Model Summary
+Mistral is a set of large language models published by the Mistral AI team. The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. Both pre-trained and instruction tuned models are available with 7 billion activated parameters.
+Weights are released under the [Apache 2 License](https://github.com/keras-team/keras-hub/blob/master/LICENSE) . Keras model code is released under the [Apache 2 License](https://github.com/keras-team/keras-hub/blob/master/LICENSE).
+## Links
+* [Mixtral Quickstart Notebook](https://www.kaggle.com/code/laxmareddypatlolla/mixtral-quickstart-notebook)
+* [Mixtral API Documentation](https://keras.io/keras_hub/api/models/mixtral/)
+* [Mixtral Model Card](https://mistral.ai/news/mixtral-of-experts)
+* [KerasHub Beginner Guide](https://keras.io/guides/keras_hub/getting_started/)
+* [KerasHub Model Publishing Guide](https://keras.io/guides/keras_hub/upload/)
+## Installation
+Keras and KerasHub can be installed with:
+```
+pip install -U -q keras-hub
+pip install -U -q keras
+```
+Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment see the [Keras Getting Started](https://keras.io/getting_started/) page.
+## Presets
+The following model checkpoints are provided by the Keras team. Full code examples for each are available below.
+| Preset name                            | Parameters | Description                                                                                                  |
+|---------------------------------------|------------|--------------------------------------------------------------------------------------------------------------|
+| mixtral_8_7b_en       | 7B      | 32-layer Mixtral MoE model with 7 billion active parameters and 8 experts per MoE layer. |
+| mixtral_8_instruct_7b_en    | 7B      | Instruction fine-tuned 32-layer Mixtral MoE model with 7 billion active parameters and 8 experts per MoE layer. |
+## Example Usage
+```Python
+import keras
+import keras_hub
+import numpy as np
+# Basic text generation
+mixtral_lm = keras_hub.models.MixtralCausalLM.from_preset("mixtral_8_instruct_7b_en")
+mixtral_lm.generate("[INST] What is Keras? [/INST]", max_length=500)
+# Generate with batched prompts
+mixtral_lm.generate([
+    "[INST] What is Keras? [/INST]",
+    "[INST] Give me your best brownie recipe. [/INST]"
+], max_length=500)
+# Using different sampling strategies
+mixtral_lm = keras_hub.models.MixtralCausalLM.from_preset("mixtral_8_instruct_7b_en")
+# Greedy sampling
+mixtral_lm.compile(sampler="greedy")
+mixtral_lm.generate("I want to say", max_length=30)
+# Beam search
+mixtral_lm.compile(
+    sampler=keras_hub.samplers.BeamSampler(
+        num_beams=2,
+        top_k_experts=2,  # MoE-specific: number of experts to use per token
+    )
+)
+mixtral_lm.generate("I want to say", max_length=30)
+# Generate without preprocessing
+prompt = {
+    "token_ids": np.array([[1, 315, 947, 298, 1315, 0, 0, 0, 0, 0]] * 2),
+    "padding_mask": np.array([[1, 1, 1, 1, 1, 0, 0, 0, 0, 0]] * 2),
+}
+mixtral_lm = keras_hub.models.MixtralCausalLM.from_preset(
+    "mixtral_8_instruct_7b_en",
+    preprocessor=None,
+    dtype="bfloat16"
+)
+mixtral_lm.generate(
+    prompt,
+    num_experts=8,           # Total number of experts per layer
+    top_k_experts=2,         # Number of experts to use per token
+    router_aux_loss_coef=0.02  # Router auxiliary loss coefficient
+)
+# Training on a single batch
+features = ["The quick brown fox jumped.", "I forgot my homework."]
+mixtral_lm = keras_hub.models.MixtralCausalLM.from_preset(
+    "mixtral_8_instruct_7b_en",
+    dtype="bfloat16"
+)
+mixtral_lm.fit(
+    x=features,
+    batch_size=2,
+    router_aux_loss_coef=0.02  # MoE-specific: router training loss
+)
+# Training without preprocessing
+x = {
+    "token_ids": np.array([[1, 315, 947, 298, 1315, 369, 315, 837, 0, 0]] * 2),
+    "padding_mask": np.array([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0]] * 2),
+}
+y = np.array([[315, 947, 298, 1315, 369, 315, 837, 0, 0, 0]] * 2)
+sw = np.array([[1, 1, 1, 1, 1, 1, 1, 0, 0, 0]] * 2)
+mixtral_lm = keras_hub.models.MixtralCausalLM.from_preset(
+    "mixtral_8_instruct_7b_en",
+    preprocessor=None,
+    dtype="bfloat16"
+)
+mixtral_lm.fit(
+    x=x,
+    y=y,
+    sample_weight=sw,
+    batch_size=2,
+    router_aux_loss_coef=0.02
+)
+```
+## Example Usage with Hugging Face URI
+```Python
+import keras
+import keras_hub
+import numpy as np
+# Basic text generation
+mixtral_lm = keras_hub.models.MixtralCausalLM.from_preset("hf://keras/mixtral_8_instruct_7b_en")
+mixtral_lm.generate("[INST] What is Keras? [/INST]", max_length=500)
+# Generate with batched prompts
+mixtral_lm.generate([
+    "[INST] What is Keras? [/INST]",
+    "[INST] Give me your best brownie recipe. [/INST]"
+], max_length=500)
+# Using different sampling strategies
+mixtral_lm = keras_hub.models.MixtralCausalLM.from_preset("hf://keras/mixtral_8_instruct_7b_en")
+# Greedy sampling
+mixtral_lm.compile(sampler="greedy")
+mixtral_lm.generate("I want to say", max_length=30)
+# Beam search
+mixtral_lm.compile(
+    sampler=keras_hub.samplers.BeamSampler(
+        num_beams=2,
+        top_k_experts=2,  # MoE-specific: number of experts to use per token
+    )
+)
+mixtral_lm.generate("I want to say", max_length=30)
+# Generate without preprocessing
+prompt = {
+    "token_ids": np.array([[1, 315, 947, 298, 1315, 0, 0, 0, 0, 0]] * 2),
+    "padding_mask": np.array([[1, 1, 1, 1, 1, 0, 0, 0, 0, 0]] * 2),
+}
+mixtral_lm = keras_hub.models.MixtralCausalLM.from_preset(
+    "hf://keras/mixtral_8_instruct_7b_en",
+    preprocessor=None,
+    dtype="bfloat16"
+)
+mixtral_lm.generate(
+    prompt,
+    num_experts=8,           # Total number of experts per layer
+    top_k_experts=2,         # Number of experts to use per token
+    router_aux_loss_coef=0.02  # Router auxiliary loss coefficient
+)
+# Training on a single batch
+features = ["The quick brown fox jumped.", "I forgot my homework."]
+mixtral_lm = keras_hub.models.MixtralCausalLM.from_preset(
+    "hf://keras/mixtral_8_instruct_7b_en",
+    dtype="bfloat16"
+)
+mixtral_lm.fit(
+    x=features,
+    batch_size=2,
+    router_aux_loss_coef=0.02  # MoE-specific: router training loss
+)
+# Training without preprocessing
+x = {
+    "token_ids": np.array([[1, 315, 947, 298, 1315, 369, 315, 837, 0, 0]] * 2),
+    "padding_mask": np.array([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0]] * 2),
+}
+y = np.array([[315, 947, 298, 1315, 369, 315, 837, 0, 0, 0]] * 2)
+sw = np.array([[1, 1, 1, 1, 1, 1, 1, 0, 0, 0]] * 2)
+mixtral_lm = keras_hub.models.MixtralCausalLM.from_preset(
+    "hf://keras/mixtral_8_instruct_7b_en",
+    preprocessor=None,
+    dtype="bfloat16"
+)
+mixtral_lm.fit(
+    x=x,
+    y=y,
+    sample_weight=sw,
+    batch_size=2,
+    router_aux_loss_coef=0.02
+)
+```