mlx-community/deepseek-vl-1.3b-chat-4bit · Trying to convert this to CoreML

Hello,

I'm trying to use this model in one of my iPhone application projects. However, to create a coreML model from the safe tensor format, I'd first need to make an ONNX model. I'm failing to produce the ONNX model since this is multimodal.

The following is the script I'm using. Please take a look at the output as well.

from transformers import AutoModel, AutoProcessor
import torch
import onnx

# Load the model and processor
model_name = "deepseek-vl-1.3b-chat-4bit"


# Load model with remote code enabled
model = AutoModel.from_pretrained(model_name, trust_remote_code=True)
processor = AutoProcessor.from_pretrained(model_name)

print("Model loaded successfully!")

# Convert to ONNX
dummy_input = torch.randn(1, 3, 224, 224)  # Adjust based on model input size
onnx_model_path = "deepseek-vl-1.3b-chat-4bit.onnx"
torch.onnx.export(model, dummy_input, onnx_model_path, opset_version=11)

print(f"ONNX Model saved to {onnx_model_path}")

Output:

ValueError: The checkpoint you are trying to load has model type `multi_modality` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`