fine tune model and convert to onnx

#77
by Gerald001 - opened

hi,

how can i convert the model to onnx after fine tuning it?

i tried using:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from optimum.exporters.onnx import main_export
from pathlib import Path

# Load the fine-tuned model and tokenizer
model_path = "modernbert-fine-tuned-save"
model = AutoModelForSequenceClassification.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Define ONNX export path
onnx_path = Path("modernbert-fine-tuned-save/model.onnx")

# Export model to ONNX
main_export(
    model_name_or_path=model_path,   # Path to fine-tuned model
    output=onnx_path,                # ONNX file path
    task="text-classification",       # Task type (alternative: "sequence-classification")
    opset=14,                          # Recommended ONNX opset version
    device="cuda"
)

print(f"ONNX model successfully saved at {onnx_path}")

but get error:

    raise RuntimeError(
RuntimeError: Detected that you are using FX to torch.jit.trace a dynamo-optimized function. This is not supported at the moment.

this seems related to: https://huggingface.co/answerdotai/ModernBERT-base/discussions/14
i tried suggestion from link above reference_compile=False but still get error
any idea about this error @fernandogd97 @bclavie @NohTow @aidayy @tomaarsen ?

full error log:

python3 export_onnx.py
Traceback (most recent call last):
  File "/teamspace/studios/this_studio/modernBert/fine-tune/export_onnx.py", line 15, in <module>
    main_export(
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/optimum/exporters/onnx/__main__.py", line 375, in main_export
    onnx_export_from_model(
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/optimum/exporters/onnx/convert.py", line 1175, in onnx_export_from_model
    _, onnx_outputs = export_models(
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/optimum/exporters/onnx/convert.py", line 762, in export_models
    export(
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/optimum/exporters/onnx/convert.py", line 866, in export
    export_output = export_pytorch(
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/optimum/exporters/onnx/convert.py", line 563, in export_pytorch
    onnx_export(
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/onnx/utils.py", line 516, in export
    _export(
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/onnx/utils.py", line 1613, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/onnx/utils.py", line 1135, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/onnx/utils.py", line 1011, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/onnx/utils.py", line 915, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
    return fn(*args, **kwargs)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
    return fn(*args, **kwargs)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/jit/_trace.py", line 1296, in _get_trace_graph
    outs = ONNXTracedModule(
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/jit/_trace.py", line 138, in forward
    graph, out = torch._C._create_graph_by_tracing(
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/jit/_trace.py", line 129, in wrapper
    outs.append(self.inner(*trace_inputs))
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/optimum/exporters/onnx/model_patcher.py", line 234, in patched_forward
    outputs = self.orig_forward(*args, **kwargs)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/transformers/models/modernbert/modeling_modernbert.py", line 1239, in forward
    outputs = self.model(
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/transformers/models/modernbert/modeling_modernbert.py", line 958, in forward
    hidden_states = self.embeddings(input_ids=input_ids, inputs_embeds=inputs_embeds)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/transformers/models/modernbert/modeling_modernbert.py", line 217, in forward
    self.compiled_embeddings(input_ids)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 470, in _fn
    raise RuntimeError(
RuntimeError: Detected that you are using FX to torch.jit.trace a dynamo-optimized function. This is not supported at the moment

Thanks,
Gerald

hi @Xenova i run the cli:

i see some warnings with -[x] values not close enough, max diff: nan (atol: 0.0001) how to fix that?
what about the other warnings?

optimum-cli export onnx --model modernbert-fine-tuned-save --task text-classification --device cuda --opset 14 modernbert-fine-tuned-save-onnx
2025-03-26 03:26:01.166001239 [W:onnxruntime:, transformer_memcpy.cc:83 ApplyImpl] 50 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2025-03-26 03:26:01.177665702 [W:onnxruntime:, session_state.cc:1263 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2025-03-26 03:26:01.177693423 [W:onnxruntime:, session_state.cc:1265 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2025-03-26 03:26:07.062768363 [W:onnxruntime:, transformer_memcpy.cc:83 ApplyImpl] 22 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2025-03-26 03:26:07.073201171 [W:onnxruntime:, session_state.cc:1263 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2025-03-26 03:26:07.073228101 [W:onnxruntime:, session_state.cc:1265 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:140: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
  warnings.warn(
                -[x] values not close enough, max diff: nan (atol: 0.0001)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 0.0001:
- logits: max diff = nan.
 The exported model was saved at: modernbert-fine-tuned-save-onnx
[W CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

here the generated files:

ls modernbert-fine-tuned-save-onnx 
config.json  model.onnx  special_tokens_map.json  tokenizer.json  tokenizer_config.json

also what is the difference between using optimum-cli onnx export and torch onnx export
the generated onnx model can be used with any model serving framework which can serve onnx?

Could you try export without --device cuda?

I've done all my exports in a colab notebook, which you can try too.

hi @Xenova

here the output without --device cuda - can i still serve the onnx model with a gpu?

$ optimum-cli export onnx --model ModernBERT-domain-classifier-save --task text-classification --opset 14 ModernBERT-domain-
classifier-save-onnx 
Compiling the model with `torch.compile` and using a `torch.cpu` device is not supported. Falling back to non-compiled mode.

here the output with with --device cuda:

$ optimum-cli export onnx --model ModernBERT-domain-classifier-save --task text-classification --device cuda --opset 14 ModernBERT-domain-classifier-save-onnx
2025-03-27 03:30:28.962170504 [W:onnxruntime:, transformer_memcpy.cc:83 ApplyImpl] 50 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2025-03-27 03:30:28.973787050 [W:onnxruntime:, session_state.cc:1263 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2025-03-27 03:30:28.973816140 [W:onnxruntime:, session_state.cc:1265 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2025-03-27 03:30:34.834608989 [W:onnxruntime:, transformer_memcpy.cc:83 ApplyImpl] 22 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2025-03-27 03:30:34.844736863 [W:onnxruntime:, session_state.cc:1263 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2025-03-27 03:30:34.844761063 [W:onnxruntime:, session_state.cc:1265 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:140: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
  warnings.warn(
                -[x] values not close enough, max diff: 0.004712104797363281 (atol: 0.0001)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 0.0001:
- logits: max diff = 0.004712104797363281.
 The exported model was saved at: ModernBERT-domain-classifier-save-onnx
[W CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

what does the -[x] values not close enough, max diff: 0.004712104797363281 (atol: 0.0001) mean? thats bad?

how can i set torch.set_float32_matmul_precision('high') before calling the cli? see: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.

and why is there 50 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph) ?

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment