Upload HymbaForCausalLM

Browse files

Files changed (8) hide show

README.md +199 -0
config.json +191 -0
configuration_hymba.py +116 -0
generation_config.json +8 -0
model-00001-of-00002.safetensors +3 -0
model-00002-of-00002.safetensors +3 -0
model.safetensors.index.json +618 -0
modeling_hymba.py +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,199 @@

+---
+library_name: transformers
+tags: []
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]

config.json ADDED Viewed

	@@ -0,0 +1,191 @@

+{
+  "architectures": [
+    "HymbaForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "attn_hidden_size": -1,
+  "attn_implementation": "flex",
+  "attn_implementation_new": "flex",
+  "auto_map": {
+    "AutoConfig": "configuration_hymba.HymbaConfig",
+    "AutoModelForCausalLM": "modeling_hymba.HymbaForCausalLM"
+  },
+  "bos_token_id": 1,
+  "calc_logits_for_entire_prompt": false,
+  "conv_dim": {
+    "0": 3200,
+    "1": 3200,
+    "2": 3200,
+    "3": 3200,
+    "4": 3200,
+    "5": 3200,
+    "6": 3200,
+    "7": 3200,
+    "8": 3200,
+    "9": 3200,
+    "10": 3200,
+    "11": 3200,
+    "12": 3200,
+    "13": 3200,
+    "14": 3200,
+    "15": 3200,
+    "16": 3200,
+    "17": 3200,
+    "18": 3200,
+    "19": 3200,
+    "20": 3200,
+    "21": 3200,
+    "22": 3200,
+    "23": 3200,
+    "24": 3200,
+    "25": 3200,
+    "26": 3200,
+    "27": 3200,
+    "28": 3200,
+    "29": 3200,
+    "30": 3200,
+    "31": 3200
+  },
+  "eos_token_id": 2,
+  "global_attn_idx": [
+    0,
+    15,
+    31
+  ],
+  "hidden_act": "silu",
+  "hidden_size": 1600,
+  "initializer_range": 0.02,
+  "intermediate_size": 5504,
+  "kq_head_dim": -1,
+  "kq_norm": "none",
+  "kv_reuse_every_i_layer": -1,
+  "kv_reuse_group": [
+    [
+      1,
+      2
+    ],
+    [
+      3,
+      4
+    ],
+    [
+      5,
+      6
+    ],
+    [
+      7,
+      8
+    ],
+    [
+      9,
+      10
+    ],
+    [
+      11,
+      12
+    ],
+    [
+      13,
+      14
+    ],
+    [
+      16,
+      17,
+      18
+    ],
+    [
+      19,
+      20
+    ],
+    [
+      21,
+      22
+    ],
+    [
+      23,
+      24
+    ],
+    [
+      25,
+      26
+    ],
+    [
+      27,
+      28
+    ],
+    [
+      29,
+      30
+    ]
+  ],
+  "kv_weight_reuse": false,
+  "layer_type": [
+    "h",
+    "h",
+    "h",
+    "h",
+    "h",
+    "h",
+    "h",
+    "h",
+    "h",
+    "h",
+    "h",
+    "h",
+    "h",
+    "h",
+    "h",
+    "h",
+    "h",
+    "h",
+    "h",
+    "h",
+    "h",
+    "h",
+    "h",
+    "h",
+    "h",
+    "h",
+    "h",
+    "h",
+    "h",
+    "h",
+    "h",
+    "h"
+  ],
+  "mamba_conv_bias": true,
+  "mamba_d_conv": 4,
+  "mamba_d_state": 16,
+  "mamba_dt_rank": 100,
+  "mamba_expand": 2,
+  "mamba_inner_layernorms": true,
+  "mamba_proj_bias": false,
+  "max_position_embeddings": 1024,
+  "memory_tokens_interspersed_every": 0,
+  "mlp_hidden_act": "silu",
+  "model_type": "hymba",
+  "num_attention_heads": 25,
+  "num_experts": 1,
+  "num_experts_per_tok": 1,
+  "num_hidden_layers": 32,
+  "num_key_value_heads": 5,
+  "num_mamba": 1,
+  "num_memory_tokens": 128,
+  "orig_max_position_embeddings": null,
+  "output_router_logits": false,
+  "pad_token_id": 0,
+  "rms_norm_eps": 1e-06,
+  "rope": true,
+  "rope_theta": 10000.0,
+  "rope_type": null,
+  "router_aux_loss_coef": 0.001,
+  "seq_length": 1024,
+  "sliding_window": 1024,
+  "tie_word_embeddings": true,
+  "torch_dtype": "float32",
+  "transformers_version": "4.44.0",
+  "use_cache": false,
+  "use_mamba_kernels": true,
+  "v_head_dim": 128,
+  "vocab_size": 32001
+}

configuration_hymba.py ADDED Viewed

	@@ -0,0 +1,116 @@

+import math
+from transformers.configuration_utils import PretrainedConfig
+class HymbaConfig(PretrainedConfig):
+    model_type = "hymba"
+    keys_to_ignore_at_inference = ["past_key_values"]
+    def __init__(
+            self,
+            vocab_size=65536,
+            tie_word_embeddings=False,
+            hidden_size=4096,
+            intermediate_size=14336,
+            num_hidden_layers=32,
+            num_attention_heads=32,
+            num_key_value_heads=8,
+            hidden_act="silu",
+            initializer_range=0.02,
+            rms_norm_eps=1e-6,
+            use_cache=True,
+            calc_logits_for_entire_prompt=False,
+            output_router_logits=False,
+            router_aux_loss_coef=0.001,
+            pad_token_id=0,
+            bos_token_id=1,
+            eos_token_id=2,
+            sliding_window=None,
+            max_position_embeddings=262144,
+            orig_max_position_embeddings=None,
+            attention_dropout=0.0,
+            num_experts_per_tok=2,
+            num_experts=16,
+            use_mamba_kernels=True,
+            mamba_d_state=16,
+            mamba_d_conv=4,
+            mamba_expand=2,
+            mamba_dt_rank="auto",
+            mamba_conv_bias=True,
+            mamba_proj_bias=False,
+            mamba_inner_layernorms=True,
+            kv_reuse_every_i_layer=-1,
+            kv_reuse_group=None,
+            kv_weight_reuse=False,
+            global_attn_idx=None,
+            num_mamba=1,
+            attn_implementation_new='sdpa',
+            rope_type=None,
+            **kwargs,
+    ):
+        self.vocab_size = vocab_size
+        self.tie_word_embeddings = tie_word_embeddings
+        self.hidden_size = hidden_size
+        self.intermediate_size = intermediate_size
+        self.num_hidden_layers = num_hidden_layers
+        self.num_attention_heads = num_attention_heads
+        self.sliding_window = sliding_window
+        self.max_position_embeddings = max_position_embeddings
+        self.orig_max_position_embeddings = orig_max_position_embeddings
+        self.attention_dropout = attention_dropout
+        if num_key_value_heads is None:
+            num_key_value_heads = num_attention_heads
+        self.num_key_value_heads = num_key_value_heads
+        self.hidden_act = hidden_act
+        self.initializer_range = initializer_range
+        self.rms_norm_eps = rms_norm_eps
+        self.use_cache = use_cache
+        self.calc_logits_for_entire_prompt = calc_logits_for_entire_prompt
+        self.output_router_logits = output_router_logits
+        self.router_aux_loss_coef = router_aux_loss_coef
+        self.num_experts_per_tok = num_experts_per_tok
+        self.num_experts = num_experts
+        self.use_mamba_kernels = use_mamba_kernels
+        self.mamba_d_state = mamba_d_state
+        self.mamba_d_conv = mamba_d_conv
+        self.mamba_expand = mamba_expand
+        self.mamba_dt_rank = math.ceil(self.hidden_size / 16) if mamba_dt_rank == "auto" else mamba_dt_rank
+        self.mamba_conv_bias = mamba_conv_bias
+        self.mamba_proj_bias = mamba_proj_bias
+        self.mamba_inner_layernorms = mamba_inner_layernorms
+        self.attn_hidden_size = kwargs.pop("attn_hidden_size", -1)
+        self.kq_head_dim = kwargs.pop("kq_head_dim", -1)
+        self.v_head_dim = kwargs.pop("v_head_dim", -1)
+        self.kq_norm = kwargs.pop("kq_norm", None)
+        self.rope = kwargs.pop("rope", False)
+        self.rope_theta = kwargs.pop("rope_theta", 10000.0)
+        self.num_memory_tokens = kwargs.pop("num_memory_tokens", 0)
+        self.memory_tokens_interspersed_every = kwargs.pop("memory_tokens_interspersed_every", 0)
+        self.kv_reuse_every_i_layer = kv_reuse_every_i_layer
+        self.kv_reuse_group = kv_reuse_group
+        self.kv_weight_reuse = kv_weight_reuse
+        self.global_attn_idx = global_attn_idx
+        self.num_mamba = num_mamba
+        self.attn_implementation_new = attn_implementation_new
+        self.rope_type = rope_type
+        super().__init__(
+            pad_token_id=pad_token_id,
+            bos_token_id=bos_token_id,
+            eos_token_id=eos_token_id,
+            tie_word_embeddings=tie_word_embeddings,
+            **kwargs,
+        )

generation_config.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "pad_token_id": 0,
+  "transformers_version": "4.44.0",
+  "use_cache": false
+}

model-00001-of-00002.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7f01b19a43514af19def4c812a1d453dfd66f5c1b0be9674090a5bf37b699fc1
+size 4988876320

model-00002-of-00002.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b11f9bec9246d8dc80612bb4e9d20f58b5744ca90ffae8944fffa0658789fde8
+size 1102383712

model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,618 @@

+{
+  "metadata": {
+    "total_size": 6091191296
+  },
+  "weight_map": {
+    "model.embed_tokens.weight": "model-00001-of-00002.safetensors",
+    "model.final_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.mamba.A_log.0": "model-00001-of-00002.safetensors",
+    "model.layers.0.mamba.B_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.mamba.C_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.mamba.D.0": "model-00001-of-00002.safetensors",
+    "model.layers.0.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
+    "model.layers.0.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.mamba.dt_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.mamba.dt_proj.0.bias": "model-00001-of-00002.safetensors",
+    "model.layers.0.mamba.dt_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.mamba.pre_avg_layernorm1.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.mamba.pre_avg_layernorm2.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.mamba.x_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.moe.experts.0.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.moe.experts.0.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.moe.experts.0.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.pre_moe_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.mamba.A_log.0": "model-00001-of-00002.safetensors",
+    "model.layers.1.mamba.B_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.mamba.C_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.mamba.D.0": "model-00001-of-00002.safetensors",
+    "model.layers.1.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
+    "model.layers.1.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.mamba.dt_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.mamba.dt_proj.0.bias": "model-00001-of-00002.safetensors",
+    "model.layers.1.mamba.dt_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.mamba.pre_avg_layernorm1.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.mamba.pre_avg_layernorm2.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.mamba.x_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.moe.experts.0.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.moe.experts.0.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.moe.experts.0.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.pre_moe_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.mamba.A_log.0": "model-00001-of-00002.safetensors",
+    "model.layers.10.mamba.B_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.mamba.C_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.mamba.D.0": "model-00001-of-00002.safetensors",
+    "model.layers.10.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
+    "model.layers.10.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.mamba.dt_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.mamba.dt_proj.0.bias": "model-00001-of-00002.safetensors",
+    "model.layers.10.mamba.dt_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.mamba.pre_avg_layernorm1.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.mamba.pre_avg_layernorm2.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.mamba.x_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.moe.experts.0.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.moe.experts.0.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.moe.experts.0.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.pre_moe_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.mamba.A_log.0": "model-00001-of-00002.safetensors",
+    "model.layers.11.mamba.B_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.mamba.C_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.mamba.D.0": "model-00001-of-00002.safetensors",
+    "model.layers.11.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
+    "model.layers.11.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.mamba.dt_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.mamba.dt_proj.0.bias": "model-00001-of-00002.safetensors",
+    "model.layers.11.mamba.dt_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.mamba.pre_avg_layernorm1.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.mamba.pre_avg_layernorm2.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.mamba.x_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.moe.experts.0.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.moe.experts.0.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.moe.experts.0.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.pre_moe_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.mamba.A_log.0": "model-00001-of-00002.safetensors",
+    "model.layers.12.mamba.B_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.mamba.C_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.mamba.D.0": "model-00001-of-00002.safetensors",
+    "model.layers.12.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
+    "model.layers.12.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.mamba.dt_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.mamba.dt_proj.0.bias": "model-00001-of-00002.safetensors",
+    "model.layers.12.mamba.dt_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.mamba.pre_avg_layernorm1.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.mamba.pre_avg_layernorm2.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.mamba.x_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.moe.experts.0.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.moe.experts.0.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.moe.experts.0.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.pre_moe_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.mamba.A_log.0": "model-00001-of-00002.safetensors",
+    "model.layers.13.mamba.B_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.mamba.C_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.mamba.D.0": "model-00001-of-00002.safetensors",
+    "model.layers.13.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
+    "model.layers.13.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.mamba.dt_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.mamba.dt_proj.0.bias": "model-00001-of-00002.safetensors",
+    "model.layers.13.mamba.dt_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.mamba.pre_avg_layernorm1.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.mamba.pre_avg_layernorm2.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.mamba.x_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.moe.experts.0.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.moe.experts.0.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.moe.experts.0.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.pre_moe_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.mamba.A_log.0": "model-00001-of-00002.safetensors",
+    "model.layers.14.mamba.B_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.mamba.C_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.mamba.D.0": "model-00001-of-00002.safetensors",
+    "model.layers.14.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
+    "model.layers.14.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.mamba.dt_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.mamba.dt_proj.0.bias": "model-00001-of-00002.safetensors",
+    "model.layers.14.mamba.dt_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.mamba.pre_avg_layernorm1.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.mamba.pre_avg_layernorm2.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.mamba.x_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.moe.experts.0.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.moe.experts.0.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.moe.experts.0.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.pre_moe_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.mamba.A_log.0": "model-00001-of-00002.safetensors",
+    "model.layers.15.mamba.B_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.mamba.C_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.mamba.D.0": "model-00001-of-00002.safetensors",
+    "model.layers.15.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
+    "model.layers.15.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.mamba.dt_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.mamba.dt_proj.0.bias": "model-00001-of-00002.safetensors",
+    "model.layers.15.mamba.dt_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.mamba.pre_avg_layernorm1.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.mamba.pre_avg_layernorm2.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.mamba.x_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.moe.experts.0.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.moe.experts.0.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.moe.experts.0.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.pre_moe_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.mamba.A_log.0": "model-00001-of-00002.safetensors",
+    "model.layers.16.mamba.B_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.mamba.C_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.mamba.D.0": "model-00001-of-00002.safetensors",
+    "model.layers.16.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
+    "model.layers.16.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.mamba.dt_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.mamba.dt_proj.0.bias": "model-00001-of-00002.safetensors",
+    "model.layers.16.mamba.dt_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.mamba.pre_avg_layernorm1.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.mamba.pre_avg_layernorm2.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.mamba.x_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.moe.experts.0.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.moe.experts.0.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.moe.experts.0.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.pre_moe_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.mamba.A_log.0": "model-00001-of-00002.safetensors",
+    "model.layers.17.mamba.B_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.mamba.C_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.mamba.D.0": "model-00001-of-00002.safetensors",
+    "model.layers.17.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
+    "model.layers.17.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.mamba.dt_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.mamba.dt_proj.0.bias": "model-00001-of-00002.safetensors",
+    "model.layers.17.mamba.dt_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.mamba.pre_avg_layernorm1.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.mamba.pre_avg_layernorm2.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.mamba.x_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.moe.experts.0.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.moe.experts.0.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.moe.experts.0.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.pre_moe_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.mamba.A_log.0": "model-00001-of-00002.safetensors",
+    "model.layers.18.mamba.B_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.mamba.C_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.mamba.D.0": "model-00001-of-00002.safetensors",
+    "model.layers.18.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
+    "model.layers.18.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.mamba.dt_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.mamba.dt_proj.0.bias": "model-00001-of-00002.safetensors",
+    "model.layers.18.mamba.dt_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.mamba.pre_avg_layernorm1.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.mamba.pre_avg_layernorm2.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.mamba.x_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.moe.experts.0.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.moe.experts.0.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.moe.experts.0.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.pre_moe_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.mamba.A_log.0": "model-00001-of-00002.safetensors",
+    "model.layers.19.mamba.B_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.mamba.C_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.mamba.D.0": "model-00001-of-00002.safetensors",
+    "model.layers.19.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
+    "model.layers.19.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.mamba.dt_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.mamba.dt_proj.0.bias": "model-00001-of-00002.safetensors",
+    "model.layers.19.mamba.dt_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.mamba.pre_avg_layernorm1.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.mamba.pre_avg_layernorm2.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.mamba.x_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.moe.experts.0.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.moe.experts.0.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.moe.experts.0.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.pre_moe_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.mamba.A_log.0": "model-00001-of-00002.safetensors",
+    "model.layers.2.mamba.B_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.mamba.C_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.mamba.D.0": "model-00001-of-00002.safetensors",
+    "model.layers.2.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
+    "model.layers.2.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.mamba.dt_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.mamba.dt_proj.0.bias": "model-00001-of-00002.safetensors",
+    "model.layers.2.mamba.dt_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.mamba.pre_avg_layernorm1.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.mamba.pre_avg_layernorm2.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.mamba.x_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.moe.experts.0.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.moe.experts.0.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.moe.experts.0.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.pre_moe_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.mamba.A_log.0": "model-00001-of-00002.safetensors",
+    "model.layers.20.mamba.B_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.mamba.C_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.mamba.D.0": "model-00001-of-00002.safetensors",
+    "model.layers.20.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
+    "model.layers.20.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.mamba.dt_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.mamba.dt_proj.0.bias": "model-00001-of-00002.safetensors",
+    "model.layers.20.mamba.dt_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.mamba.pre_avg_layernorm1.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.mamba.pre_avg_layernorm2.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.mamba.x_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.moe.experts.0.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.moe.experts.0.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.moe.experts.0.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.pre_moe_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.mamba.A_log.0": "model-00001-of-00002.safetensors",
+    "model.layers.21.mamba.B_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.mamba.C_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.mamba.D.0": "model-00001-of-00002.safetensors",
+    "model.layers.21.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
+    "model.layers.21.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.mamba.dt_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.mamba.dt_proj.0.bias": "model-00001-of-00002.safetensors",
+    "model.layers.21.mamba.dt_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.mamba.pre_avg_layernorm1.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.mamba.pre_avg_layernorm2.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.mamba.x_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.moe.experts.0.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.moe.experts.0.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.moe.experts.0.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.pre_moe_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.mamba.A_log.0": "model-00001-of-00002.safetensors",
+    "model.layers.22.mamba.B_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.mamba.C_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.mamba.D.0": "model-00001-of-00002.safetensors",
+    "model.layers.22.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
+    "model.layers.22.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.mamba.dt_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.mamba.dt_proj.0.bias": "model-00001-of-00002.safetensors",
+    "model.layers.22.mamba.dt_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.mamba.pre_avg_layernorm1.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.mamba.pre_avg_layernorm2.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.mamba.x_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.moe.experts.0.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.moe.experts.0.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.moe.experts.0.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.pre_moe_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.23.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.23.mamba.A_log.0": "model-00001-of-00002.safetensors",
+    "model.layers.23.mamba.B_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.23.mamba.C_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.23.mamba.D.0": "model-00001-of-00002.safetensors",
+    "model.layers.23.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
+    "model.layers.23.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
+    "model.layers.23.mamba.dt_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.23.mamba.dt_proj.0.bias": "model-00001-of-00002.safetensors",
+    "model.layers.23.mamba.dt_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.23.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.23.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.23.mamba.pre_avg_layernorm1.weight": "model-00001-of-00002.safetensors",
+    "model.layers.23.mamba.pre_avg_layernorm2.weight": "model-00001-of-00002.safetensors",
+    "model.layers.23.mamba.x_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.23.moe.experts.0.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.23.moe.experts.0.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.23.moe.experts.0.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.23.pre_moe_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.24.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.24.mamba.A_log.0": "model-00001-of-00002.safetensors",
+    "model.layers.24.mamba.B_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.24.mamba.C_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.24.mamba.D.0": "model-00001-of-00002.safetensors",
+    "model.layers.24.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
+    "model.layers.24.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
+    "model.layers.24.mamba.dt_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.24.mamba.dt_proj.0.bias": "model-00001-of-00002.safetensors",
+    "model.layers.24.mamba.dt_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.24.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.24.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.24.mamba.pre_avg_layernorm1.weight": "model-00001-of-00002.safetensors",
+    "model.layers.24.mamba.pre_avg_layernorm2.weight": "model-00001-of-00002.safetensors",
+    "model.layers.24.mamba.x_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.24.moe.experts.0.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.24.moe.experts.0.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.24.moe.experts.0.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.24.pre_moe_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.25.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.25.mamba.A_log.0": "model-00001-of-00002.safetensors",
+    "model.layers.25.mamba.B_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.25.mamba.C_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.25.mamba.D.0": "model-00001-of-00002.safetensors",
+    "model.layers.25.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
+    "model.layers.25.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
+    "model.layers.25.mamba.dt_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.25.mamba.dt_proj.0.bias": "model-00001-of-00002.safetensors",
+    "model.layers.25.mamba.dt_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.25.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.25.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.25.mamba.pre_avg_layernorm1.weight": "model-00001-of-00002.safetensors",
+    "model.layers.25.mamba.pre_avg_layernorm2.weight": "model-00001-of-00002.safetensors",
+    "model.layers.25.mamba.x_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.25.moe.experts.0.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.25.moe.experts.0.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.25.moe.experts.0.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.25.pre_moe_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.26.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.mamba.A_log.0": "model-00002-of-00002.safetensors",
+    "model.layers.26.mamba.B_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.mamba.C_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.mamba.D.0": "model-00002-of-00002.safetensors",
+    "model.layers.26.mamba.conv1d.bias": "model-00002-of-00002.safetensors",
+    "model.layers.26.mamba.conv1d.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.mamba.dt_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.mamba.dt_proj.0.bias": "model-00002-of-00002.safetensors",
+    "model.layers.26.mamba.dt_proj.0.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.mamba.in_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.mamba.out_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.mamba.pre_avg_layernorm1.weight": "model-00001-of-00002.safetensors",
+    "model.layers.26.mamba.pre_avg_layernorm2.weight": "model-00001-of-00002.safetensors",
+    "model.layers.26.mamba.x_proj.0.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.moe.experts.0.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.moe.experts.0.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.moe.experts.0.up_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.26.pre_moe_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.mamba.A_log.0": "model-00002-of-00002.safetensors",
+    "model.layers.27.mamba.B_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.mamba.C_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.mamba.D.0": "model-00002-of-00002.safetensors",
+    "model.layers.27.mamba.conv1d.bias": "model-00002-of-00002.safetensors",
+    "model.layers.27.mamba.conv1d.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.mamba.dt_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.mamba.dt_proj.0.bias": "model-00002-of-00002.safetensors",
+    "model.layers.27.mamba.dt_proj.0.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.mamba.in_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.mamba.out_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.mamba.pre_avg_layernorm1.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.mamba.pre_avg_layernorm2.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.mamba.x_proj.0.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.moe.experts.0.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.moe.experts.0.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.moe.experts.0.up_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.27.pre_moe_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.28.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.28.mamba.A_log.0": "model-00002-of-00002.safetensors",
+    "model.layers.28.mamba.B_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.28.mamba.C_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.28.mamba.D.0": "model-00002-of-00002.safetensors",
+    "model.layers.28.mamba.conv1d.bias": "model-00002-of-00002.safetensors",
+    "model.layers.28.mamba.conv1d.weight": "model-00002-of-00002.safetensors",
+    "model.layers.28.mamba.dt_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.28.mamba.dt_proj.0.bias": "model-00002-of-00002.safetensors",
+    "model.layers.28.mamba.dt_proj.0.weight": "model-00002-of-00002.safetensors",
+    "model.layers.28.mamba.in_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.28.mamba.out_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.28.mamba.pre_avg_layernorm1.weight": "model-00002-of-00002.safetensors",
+    "model.layers.28.mamba.pre_avg_layernorm2.weight": "model-00002-of-00002.safetensors",
+    "model.layers.28.mamba.x_proj.0.weight": "model-00002-of-00002.safetensors",
+    "model.layers.28.moe.experts.0.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.28.moe.experts.0.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.28.moe.experts.0.up_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.28.pre_moe_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.29.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.29.mamba.A_log.0": "model-00002-of-00002.safetensors",
+    "model.layers.29.mamba.B_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.29.mamba.C_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.29.mamba.D.0": "model-00002-of-00002.safetensors",
+    "model.layers.29.mamba.conv1d.bias": "model-00002-of-00002.safetensors",
+    "model.layers.29.mamba.conv1d.weight": "model-00002-of-00002.safetensors",
+    "model.layers.29.mamba.dt_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.29.mamba.dt_proj.0.bias": "model-00002-of-00002.safetensors",
+    "model.layers.29.mamba.dt_proj.0.weight": "model-00002-of-00002.safetensors",
+    "model.layers.29.mamba.in_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.29.mamba.out_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.29.mamba.pre_avg_layernorm1.weight": "model-00002-of-00002.safetensors",
+    "model.layers.29.mamba.pre_avg_layernorm2.weight": "model-00002-of-00002.safetensors",
+    "model.layers.29.mamba.x_proj.0.weight": "model-00002-of-00002.safetensors",
+    "model.layers.29.moe.experts.0.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.29.moe.experts.0.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.29.moe.experts.0.up_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.29.pre_moe_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.mamba.A_log.0": "model-00001-of-00002.safetensors",
+    "model.layers.3.mamba.B_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.mamba.C_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.mamba.D.0": "model-00001-of-00002.safetensors",
+    "model.layers.3.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
+    "model.layers.3.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.mamba.dt_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.mamba.dt_proj.0.bias": "model-00001-of-00002.safetensors",
+    "model.layers.3.mamba.dt_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.mamba.pre_avg_layernorm1.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.mamba.pre_avg_layernorm2.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.mamba.x_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.moe.experts.0.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.moe.experts.0.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.moe.experts.0.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.pre_moe_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.30.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.30.mamba.A_log.0": "model-00002-of-00002.safetensors",
+    "model.layers.30.mamba.B_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.30.mamba.C_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.30.mamba.D.0": "model-00002-of-00002.safetensors",
+    "model.layers.30.mamba.conv1d.bias": "model-00002-of-00002.safetensors",
+    "model.layers.30.mamba.conv1d.weight": "model-00002-of-00002.safetensors",
+    "model.layers.30.mamba.dt_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.30.mamba.dt_proj.0.bias": "model-00002-of-00002.safetensors",
+    "model.layers.30.mamba.dt_proj.0.weight": "model-00002-of-00002.safetensors",
+    "model.layers.30.mamba.in_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.30.mamba.out_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.30.mamba.pre_avg_layernorm1.weight": "model-00002-of-00002.safetensors",
+    "model.layers.30.mamba.pre_avg_layernorm2.weight": "model-00002-of-00002.safetensors",
+    "model.layers.30.mamba.x_proj.0.weight": "model-00002-of-00002.safetensors",
+    "model.layers.30.moe.experts.0.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.30.moe.experts.0.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.30.moe.experts.0.up_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.30.pre_moe_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.31.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.31.mamba.A_log.0": "model-00002-of-00002.safetensors",
+    "model.layers.31.mamba.B_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.31.mamba.C_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.31.mamba.D.0": "model-00002-of-00002.safetensors",
+    "model.layers.31.mamba.conv1d.bias": "model-00002-of-00002.safetensors",
+    "model.layers.31.mamba.conv1d.weight": "model-00002-of-00002.safetensors",
+    "model.layers.31.mamba.dt_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.31.mamba.dt_proj.0.bias": "model-00002-of-00002.safetensors",
+    "model.layers.31.mamba.dt_proj.0.weight": "model-00002-of-00002.safetensors",
+    "model.layers.31.mamba.in_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.31.mamba.out_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.31.mamba.pre_avg_layernorm1.weight": "model-00002-of-00002.safetensors",
+    "model.layers.31.mamba.pre_avg_layernorm2.weight": "model-00002-of-00002.safetensors",
+    "model.layers.31.mamba.x_proj.0.weight": "model-00002-of-00002.safetensors",
+    "model.layers.31.moe.experts.0.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.31.moe.experts.0.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.31.moe.experts.0.up_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.31.pre_moe_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.mamba.A_log.0": "model-00001-of-00002.safetensors",
+    "model.layers.4.mamba.B_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.mamba.C_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.mamba.D.0": "model-00001-of-00002.safetensors",
+    "model.layers.4.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
+    "model.layers.4.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.mamba.dt_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.mamba.dt_proj.0.bias": "model-00001-of-00002.safetensors",
+    "model.layers.4.mamba.dt_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.mamba.pre_avg_layernorm1.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.mamba.pre_avg_layernorm2.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.mamba.x_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.moe.experts.0.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.moe.experts.0.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.moe.experts.0.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.pre_moe_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.mamba.A_log.0": "model-00001-of-00002.safetensors",
+    "model.layers.5.mamba.B_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.mamba.C_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.mamba.D.0": "model-00001-of-00002.safetensors",
+    "model.layers.5.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
+    "model.layers.5.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.mamba.dt_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.mamba.dt_proj.0.bias": "model-00001-of-00002.safetensors",
+    "model.layers.5.mamba.dt_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.mamba.pre_avg_layernorm1.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.mamba.pre_avg_layernorm2.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.mamba.x_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.moe.experts.0.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.moe.experts.0.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.moe.experts.0.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.pre_moe_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.mamba.A_log.0": "model-00001-of-00002.safetensors",
+    "model.layers.6.mamba.B_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.mamba.C_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.mamba.D.0": "model-00001-of-00002.safetensors",
+    "model.layers.6.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
+    "model.layers.6.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.mamba.dt_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.mamba.dt_proj.0.bias": "model-00001-of-00002.safetensors",
+    "model.layers.6.mamba.dt_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.mamba.pre_avg_layernorm1.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.mamba.pre_avg_layernorm2.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.mamba.x_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.moe.experts.0.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.moe.experts.0.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.moe.experts.0.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.pre_moe_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.mamba.A_log.0": "model-00001-of-00002.safetensors",
+    "model.layers.7.mamba.B_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.mamba.C_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.mamba.D.0": "model-00001-of-00002.safetensors",
+    "model.layers.7.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
+    "model.layers.7.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.mamba.dt_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.mamba.dt_proj.0.bias": "model-00001-of-00002.safetensors",
+    "model.layers.7.mamba.dt_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.mamba.pre_avg_layernorm1.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.mamba.pre_avg_layernorm2.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.mamba.x_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.moe.experts.0.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.moe.experts.0.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.moe.experts.0.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.pre_moe_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.mamba.A_log.0": "model-00001-of-00002.safetensors",
+    "model.layers.8.mamba.B_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.mamba.C_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.mamba.D.0": "model-00001-of-00002.safetensors",
+    "model.layers.8.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
+    "model.layers.8.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.mamba.dt_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.mamba.dt_proj.0.bias": "model-00001-of-00002.safetensors",
+    "model.layers.8.mamba.dt_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.mamba.pre_avg_layernorm1.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.mamba.pre_avg_layernorm2.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.mamba.x_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.moe.experts.0.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.moe.experts.0.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.moe.experts.0.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.pre_moe_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.mamba.A_log.0": "model-00001-of-00002.safetensors",
+    "model.layers.9.mamba.B_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.mamba.C_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.mamba.D.0": "model-00001-of-00002.safetensors",
+    "model.layers.9.mamba.conv1d.bias": "model-00001-of-00002.safetensors",
+    "model.layers.9.mamba.conv1d.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.mamba.dt_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.mamba.dt_proj.0.bias": "model-00001-of-00002.safetensors",
+    "model.layers.9.mamba.dt_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.mamba.in_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.mamba.out_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.mamba.pre_avg_layernorm1.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.mamba.pre_avg_layernorm2.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.mamba.x_proj.0.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.moe.experts.0.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.moe.experts.0.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.moe.experts.0.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.pre_moe_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.memory_tokens": "model-00001-of-00002.safetensors"
+  }
+}

modeling_hymba.py ADDED Viewed

The diff for this file is too large to render. See raw diff