Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Paper
•
2503.06269
•
Published
None defined yet.
from optimum.onnxruntime import ORTModelForSequenceClassification
# Load the model from the hub and export it to the ONNX format
model_id = "distilbert-base-uncased-finetuned-sst-2-english"
model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)