fine tune model and convert to onnx
hi,
how can i convert the model to onnx after fine tuning it?
i tried using:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from optimum.exporters.onnx import main_export
from pathlib import Path
# Load the fine-tuned model and tokenizer
model_path = "modernbert-fine-tuned-save"
model = AutoModelForSequenceClassification.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)
# Define ONNX export path
onnx_path = Path("modernbert-fine-tuned-save/model.onnx")
# Export model to ONNX
main_export(
model_name_or_path=model_path, # Path to fine-tuned model
output=onnx_path, # ONNX file path
task="text-classification", # Task type (alternative: "sequence-classification")
opset=14, # Recommended ONNX opset version
device="cuda"
)
print(f"ONNX model successfully saved at {onnx_path}")
but get error:
raise RuntimeError(
RuntimeError: Detected that you are using FX to torch.jit.trace a dynamo-optimized function. This is not supported at the moment.
this seems related to: https://huggingface.co/answerdotai/ModernBERT-base/discussions/14
i tried suggestion from link above reference_compile=False
but still get error
any idea about this error
@fernandogd97
@bclavie
@NohTow
@aidayy
@tomaarsen
?
full error log:
python3 export_onnx.py
Traceback (most recent call last):
File "/teamspace/studios/this_studio/modernBert/fine-tune/export_onnx.py", line 15, in <module>
main_export(
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/optimum/exporters/onnx/__main__.py", line 375, in main_export
onnx_export_from_model(
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/optimum/exporters/onnx/convert.py", line 1175, in onnx_export_from_model
_, onnx_outputs = export_models(
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/optimum/exporters/onnx/convert.py", line 762, in export_models
export(
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/optimum/exporters/onnx/convert.py", line 866, in export
export_output = export_pytorch(
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/optimum/exporters/onnx/convert.py", line 563, in export_pytorch
onnx_export(
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/onnx/utils.py", line 516, in export
_export(
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/onnx/utils.py", line 1613, in _export
graph, params_dict, torch_out = _model_to_graph(
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/onnx/utils.py", line 1135, in _model_to_graph
graph, params, torch_out, module = _create_jit_graph(model, args)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/onnx/utils.py", line 1011, in _create_jit_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/onnx/utils.py", line 915, in _trace_and_get_graph_from_model
trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
return fn(*args, **kwargs)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
return fn(*args, **kwargs)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/jit/_trace.py", line 1296, in _get_trace_graph
outs = ONNXTracedModule(
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/jit/_trace.py", line 138, in forward
graph, out = torch._C._create_graph_by_tracing(
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/jit/_trace.py", line 129, in wrapper
outs.append(self.inner(*trace_inputs))
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _slow_forward
result = self.forward(*input, **kwargs)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/optimum/exporters/onnx/model_patcher.py", line 234, in patched_forward
outputs = self.orig_forward(*args, **kwargs)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/transformers/models/modernbert/modeling_modernbert.py", line 1239, in forward
outputs = self.model(
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _slow_forward
result = self.forward(*input, **kwargs)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/transformers/models/modernbert/modeling_modernbert.py", line 958, in forward
hidden_states = self.embeddings(input_ids=input_ids, inputs_embeds=inputs_embeds)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _slow_forward
result = self.forward(*input, **kwargs)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/transformers/models/modernbert/modeling_modernbert.py", line 217, in forward
self.compiled_embeddings(input_ids)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 470, in _fn
raise RuntimeError(
RuntimeError: Detected that you are using FX to torch.jit.trace a dynamo-optimized function. This is not supported at the moment
Thanks,
Gerald
Hi there - can you try export from the CLI? https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model
hi @Xenova i run the cli:
i see some warnings with -[x] values not close enough, max diff: nan (atol: 0.0001)
how to fix that?
what about the other warnings?
optimum-cli export onnx --model modernbert-fine-tuned-save --task text-classification --device cuda --opset 14 modernbert-fine-tuned-save-onnx
2025-03-26 03:26:01.166001239 [W:onnxruntime:, transformer_memcpy.cc:83 ApplyImpl] 50 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2025-03-26 03:26:01.177665702 [W:onnxruntime:, session_state.cc:1263 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2025-03-26 03:26:01.177693423 [W:onnxruntime:, session_state.cc:1265 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2025-03-26 03:26:07.062768363 [W:onnxruntime:, transformer_memcpy.cc:83 ApplyImpl] 22 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2025-03-26 03:26:07.073201171 [W:onnxruntime:, session_state.cc:1263 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2025-03-26 03:26:07.073228101 [W:onnxruntime:, session_state.cc:1265 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:140: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
warnings.warn(
-[x] values not close enough, max diff: nan (atol: 0.0001)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 0.0001:
- logits: max diff = nan.
The exported model was saved at: modernbert-fine-tuned-save-onnx
[W CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
here the generated files:
ls modernbert-fine-tuned-save-onnx
config.json model.onnx special_tokens_map.json tokenizer.json tokenizer_config.json
also what is the difference between using optimum-cli onnx export and torch onnx export
the generated onnx model can be used with any model serving framework which can serve onnx?
Could you try export without --device cuda
?
I've done all my exports in a colab notebook, which you can try too.
hi @Xenova
here the output without --device cuda - can i still serve the onnx model with a gpu?
$ optimum-cli export onnx --model ModernBERT-domain-classifier-save --task text-classification --opset 14 ModernBERT-domain-
classifier-save-onnx
Compiling the model with `torch.compile` and using a `torch.cpu` device is not supported. Falling back to non-compiled mode.
here the output with with --device cuda:
$ optimum-cli export onnx --model ModernBERT-domain-classifier-save --task text-classification --device cuda --opset 14 ModernBERT-domain-classifier-save-onnx
2025-03-27 03:30:28.962170504 [W:onnxruntime:, transformer_memcpy.cc:83 ApplyImpl] 50 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2025-03-27 03:30:28.973787050 [W:onnxruntime:, session_state.cc:1263 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2025-03-27 03:30:28.973816140 [W:onnxruntime:, session_state.cc:1265 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2025-03-27 03:30:34.834608989 [W:onnxruntime:, transformer_memcpy.cc:83 ApplyImpl] 22 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2025-03-27 03:30:34.844736863 [W:onnxruntime:, session_state.cc:1263 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2025-03-27 03:30:34.844761063 [W:onnxruntime:, session_state.cc:1265 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:140: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
warnings.warn(
-[x] values not close enough, max diff: 0.004712104797363281 (atol: 0.0001)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 0.0001:
- logits: max diff = 0.004712104797363281.
The exported model was saved at: ModernBERT-domain-classifier-save-onnx
[W CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
what does the -[x] values not close enough, max diff: 0.004712104797363281 (atol: 0.0001)
mean? thats bad?
how can i set torch.set_float32_matmul_precision('high')
before calling the cli? see: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
and why is there 50 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph)
?