Spaces:
Runtime error
Runtime error
File size: 30,285 Bytes
db5855f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 |
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# 🤗 Hugging Face Model Hub with OpenVINO™\n",
"\n",
"The Hugging Face (HF) [Model Hub](https://huggingface.co/models) is a central repository for pre-trained deep learning models. It allows exploration and provides access to thousands of models for a wide range of tasks, including text classification, question answering, and image classification.\n",
"Hugging Face provides Python packages that serve as APIs and tools to easily download and fine tune state-of-the-art pretrained models, namely [transformers](https://github.com/huggingface/transformers) and [diffusers](https://github.com/huggingface/diffusers) packages.\n",
"\n",
"\n",
"\n",
"Throughout this notebook we will learn:\n",
"1. How to load a HF pipeline using the `transformers` package and then convert it to OpenVINO.\n",
"2. How to load the same pipeline using Optimum Intel package.\n",
"\n",
"\n",
"#### Table of contents:\n",
"\n",
"- [Converting a Model from the HF Transformers Package](#Converting-a-Model-from-the-HF-Transformers-Package)\n",
" - [Installing Requirements](#Installing-Requirements)\n",
" - [Imports](#Imports)\n",
" - [Initializing a Model Using the HF Transformers Package](#Initializing-a-Model-Using-the-HF-Transformers-Package)\n",
" - [Original Model inference](#Original-Model-inference)\n",
" - [Converting the Model to OpenVINO IR format](#Converting-the-Model-to-OpenVINO-IR-format)\n",
" - [Converted Model Inference](#Converted-Model-Inference)\n",
"- [Converting a Model Using the Optimum Intel Package](#Converting-a-Model-Using-the-Optimum-Intel-Package)\n",
" - [Install Requirements for Optimum](#Install-Requirements-for-Optimum)\n",
" - [Import Optimum](#Import-Optimum)\n",
" - [Initialize and Convert the Model Automatically using OVModel class](#Initialize-and-Convert-the-Model-Automatically-using-OVModel-class)\n",
" - [Convert model using Optimum CLI interface](#Convert-model-using-Optimum-CLI-interface)\n",
" - [The Optimum Model Inference](#The-Optimum-Model-Inference)\n",
"\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Converting a Model from the HF Transformers Package\n",
"[back to top ⬆️](#Table-of-contents:)\n",
"\n",
"Hugging Face transformers package provides API for initializing a model and loading a set of pre-trained weights using the model text handle.\n",
"Discovering a desired model name is straightforward with [HF website's Models page](https://huggingface.co/models), one can choose a model solving a particular machine learning problem and even sort the models by popularity and novelty."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Installing Requirements\n",
"[back to top ⬆️](#Table-of-contents:)\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%pip install -q --extra-index-url https://download.pytorch.org/whl/cpu \"transformers>=4.33.0\" \"torch>=2.1.0\"\n",
"%pip install -q ipywidgets\n",
"%pip install -q \"openvino>=2023.1.0\""
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Imports\n",
"[back to top ⬆️](#Table-of-contents:)\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from pathlib import Path\n",
"\n",
"import numpy as np\n",
"import torch\n",
"\n",
"from transformers import AutoModelForSequenceClassification\n",
"from transformers import AutoTokenizer"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Initializing a Model Using the HF Transformers Package\n",
"[back to top ⬆️](#Table-of-contents:)\n",
"\n",
"We will use [roberta text sentiment classification](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest) model in our example, it is a transformer-based encoder model pretrained in a special way, please refer to the model card to learn more.\n",
"\n",
"Following the instructions on the model page, we use `AutoModelForSequenceClassification` to initialize the model and perform inference with it.\n",
"To find more information on HF pipelines and model initialization please refer to [HF tutorials](https://huggingface.co/learn/nlp-course/chapter2/2?fw=pt#behind-the-pipeline)."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']\n",
"- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n"
]
}
],
"source": [
"MODEL = \"cardiffnlp/twitter-roberta-base-sentiment-latest\"\n",
"\n",
"tokenizer = AutoTokenizer.from_pretrained(MODEL, return_dict=True)\n",
"\n",
"# The torchscript=True flag is used to ensure the model outputs are tuples\n",
"# instead of ModelOutput (which causes JIT errors).\n",
"model = AutoModelForSequenceClassification.from_pretrained(MODEL, torchscript=True)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Original Model inference\n",
"[back to top ⬆️](#Table-of-contents:)\n",
"\n",
"Let's do a classification of a simple prompt below."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1) positive 0.9485\n",
"2) neutral 0.0484\n",
"3) negative 0.0031\n"
]
}
],
"source": [
"text = \"HF models run perfectly with OpenVINO!\"\n",
"\n",
"encoded_input = tokenizer(text, return_tensors=\"pt\")\n",
"output = model(**encoded_input)\n",
"scores = output[0][0]\n",
"scores = torch.softmax(scores, dim=0).numpy(force=True)\n",
"\n",
"\n",
"def print_prediction(scores):\n",
" for i, descending_index in enumerate(scores.argsort()[::-1]):\n",
" label = model.config.id2label[descending_index]\n",
" score = np.round(float(scores[descending_index]), 4)\n",
" print(f\"{i+1}) {label} {score}\")\n",
"\n",
"\n",
"print_prediction(scores)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Converting the Model to OpenVINO IR format\n",
"[back to top ⬆️](#Table-of-contents:)\n",
"We use the OpenVINO [Model conversion API](https://docs.openvino.ai/2024/openvino-workflow/model-preparation.html#convert-a-model-with-python-convert-model) to convert the model (this one is implemented in PyTorch) to OpenVINO Intermediate Representation (IR).\n",
"\n",
"Note how we reuse our real `encoded_input`, passing it to the `ov.convert_model` function. It will be used for model tracing."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"import openvino as ov\n",
"\n",
"save_model_path = Path(\"./models/model.xml\")\n",
"\n",
"if not save_model_path.exists():\n",
" ov_model = ov.convert_model(model, example_input=dict(encoded_input))\n",
" ov.save_model(ov_model, save_model_path)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Converted Model Inference\n",
"[back to top ⬆️](#Table-of-contents:)\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"First, we pick a device to do the model inference"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "076e75b32a964983a4a6df36c1c3d1e0",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Dropdown(description='Device:', index=2, options=('CPU', 'GPU', 'AUTO'), value='AUTO')"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import ipywidgets as widgets\n",
"\n",
"core = ov.Core()\n",
"\n",
"device = widgets.Dropdown(\n",
" options=core.available_devices + [\"AUTO\"],\n",
" value=\"AUTO\",\n",
" description=\"Device:\",\n",
" disabled=False,\n",
")\n",
"\n",
"device"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"OpenVINO model IR must be compiled for a specific device prior to the model inference."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1) positive 0.9483\n",
"2) neutral 0.0485\n",
"3) negative 0.0031\n"
]
}
],
"source": [
"compiled_model = core.compile_model(save_model_path, device.value)\n",
"\n",
"# Compiled model call is performed using the same parameters as for the original model\n",
"scores_ov = compiled_model(encoded_input.data)[0]\n",
"\n",
"scores_ov = torch.softmax(torch.tensor(scores_ov[0]), dim=0).detach().numpy()\n",
"\n",
"print_prediction(scores_ov)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Note the prediction of the converted model match exactly the one of the original model.\n",
"\n",
"This is a rather simple example as the pipeline includes just one encoder model. Contemporary state of the art pipelines often consist of several model, feel free to explore other OpenVINO tutorials:\n",
"1. [Stable Diffusion v2](../stable-diffusion-v2)\n",
"2. [Zero-shot Image Classification with OpenAI CLIP](../clip-zero-shot-image-classification)\n",
"3. [Controllable Music Generation with MusicGen](../music-generation)\n",
"\n",
"The workflow for the `diffusers` package is exactly the same. The first example in the list above relies on the `diffusers`."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Converting a Model Using the Optimum Intel Package\n",
"[back to top ⬆️](#Table-of-contents:)\n",
"\n",
"🤗 Optimum Intel is the interface between the 🤗 Transformers and Diffusers libraries and the different tools and libraries provided by Intel to accelerate end-to-end pipelines on Intel architectures.\n",
"\n",
"Among other use cases, Optimum Intel provides a simple interface to optimize your Transformers and Diffusers models, convert them to the OpenVINO Intermediate Representation (IR) format and run inference using OpenVINO Runtime."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Install Requirements for Optimum\n",
"[back to top ⬆️](#Table-of-contents:)\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
"To disable this warning, you can either:\n",
"\t- Avoid using `tokenizers` before the fork if possible\n",
"\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3.1\u001b[0m\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install -q \"git+https://github.com/huggingface/optimum-intel.git\" onnx"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Import Optimum\n",
"[back to top ⬆️](#Table-of-contents:)\n",
"\n",
"Documentation for Optimum Intel states:\n",
">You can now easily perform inference with OpenVINO Runtime on a variety of Intel processors (see the full list of supported devices). For that, just replace the `AutoModelForXxx` class with the corresponding `OVModelForXxx` class.\n",
"\n",
"You can find more information in [Optimum Intel documentation](https://huggingface.co/docs/optimum/intel/inference)."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, onnx, openvino\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
"To disable this warning, you can either:\n",
"\t- Avoid using `tokenizers` before the fork if possible\n",
"\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n",
"huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
"To disable this warning, you can either:\n",
"\t- Avoid using `tokenizers` before the fork if possible\n",
"\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n",
"No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'\n"
]
}
],
"source": [
"from optimum.intel.openvino import OVModelForSequenceClassification"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Initialize and Convert the Model Automatically using OVModel class\n",
"[back to top ⬆️](#Table-of-contents:)\n",
"\n",
"To load a Transformers model and convert it to the OpenVINO format on the fly, you can set `export=True` when loading your model. The model can be saved in OpenVINO format using `save_pretrained` method and specifying a directory for storing the model as an argument. For the next usage, you can avoid the conversion step and load the saved early model from disk using `from_pretrained` method without export specification. We also specified `device` parameter for compiling the model on the specific device, if not provided, the default device will be used. The device can be changed later in runtime using `model.to(device)`, please note that it may require some time for model compilation on a newly selected device. In some cases, it can be useful to separate model initialization and compilation, for example, if you want to reshape the model using `reshape` method, you can postpone compilation, providing the parameter `compile=False` into `from_pretrained` method, compilation can be performed manually using `compile` method or will be performed automatically during first inference run."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Framework not specified. Using pt to export to ONNX.\n",
"Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']\n",
"- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
"Using the export variant default. Available variants are:\n",
" - default: The default ONNX variant.\n",
"Using framework PyTorch: 2.1.0+cpu\n",
"Overriding 1 configuration item(s)\n",
"\t- use_cache -> False\n",
"Compiling the model to AUTO ...\n"
]
}
],
"source": [
"model = OVModelForSequenceClassification.from_pretrained(MODEL, export=True, device=device.value)\n",
"\n",
"# The save_pretrained() method saves the model weights to avoid conversion on the next load.\n",
"model.save_pretrained(\"./models/optimum_model\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Convert model using Optimum CLI interface\n",
"[back to top ⬆️](#Table-of-contents:)\n",
"\n",
"Alternatively, you can use the Optimum CLI interface for converting models (supported starting optimum-intel 1.12 version).\n",
"General command format:\n",
"\n",
"```bash\n",
"optimum-cli export openvino --model <model_id_or_path> --task <task> <output_dir>\n",
"```\n",
"\n",
"where task is task to export the model for, if not specified, the task will be auto-inferred based on the model. Available tasks depend on the model, but are among: ['default', 'fill-mask', 'text-generation', 'text2text-generation', 'text-classification', 'token-classification', 'multiple-choice', 'object-detection', 'question-answering', 'image-classification', 'image-segmentation', 'masked-im', 'semantic-segmentation', 'automatic-speech-recognition', 'audio-classification', 'audio-frame-classification', 'automatic-speech-recognition', 'audio-xvector', 'image-to-text', 'stable-diffusion', 'zero-shot-object-detection']. For decoder models, use `xxx-with-past` to export the model using past key values in the decoder. \n",
"\n",
"You can find a mapping between tasks and model classes in Optimum TaskManager [documentation](https://huggingface.co/docs/optimum/exporters/task_manager).\n",
"\n",
"Additionally, you can specify weights compression `--fp16` for the compression model to FP16 and `--int8` for the compression model to INT8. Please note, that for INT8, it is necessary to install nncf.\n",
"\n",
"Full list of supported arguments available via `--help`"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
"To disable this warning, you can either:\n",
"\t- Avoid using `tokenizers` before the fork if possible\n",
"\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"usage: optimum-cli export openvino [-h] -m MODEL [--task TASK]\n",
" [--cache_dir CACHE_DIR]\n",
" [--framework {pt,tf}] [--trust-remote-code]\n",
" [--pad-token-id PAD_TOKEN_ID] [--fp16]\n",
" [--int8]\n",
" output\n",
"\n",
"optional arguments:\n",
" -h, --help show this help message and exit\n",
"\n",
"Required arguments:\n",
" -m MODEL, --model MODEL\n",
" Model ID on huggingface.co or path on disk to load\n",
" model from.\n",
" output Path indicating the directory where to store the\n",
" generated OV model.\n",
"\n",
"Optional arguments:\n",
" --task TASK The task to export the model for. If not specified,\n",
" the task will be auto-inferred based on the model.\n",
" Available tasks depend on the model, but are among:\n",
" ['semantic-segmentation', 'zero-shot-image-\n",
" classification', 'text-generation', 'stable-diffusion-\n",
" xl', 'image-classification', 'image-segmentation',\n",
" 'conversational', 'audio-classification', 'text2text-\n",
" generation', 'automatic-speech-recognition', 'text-to-\n",
" audio', 'audio-frame-classification', 'question-\n",
" answering', 'stable-diffusion', 'mask-generation',\n",
" 'zero-shot-object-detection', 'token-classification',\n",
" 'image-to-text', 'feature-extraction', 'audio-\n",
" xvector', 'text-classification', 'fill-mask', 'object-\n",
" detection', 'multiple-choice', 'masked-im']. For\n",
" decoder models, use `xxx-with-past` to export the\n",
" model using past key values in the decoder.\n",
" --cache_dir CACHE_DIR\n",
" Path indicating where to store cache.\n",
" --framework {pt,tf} The framework to use for the export. If not provided,\n",
" will attempt to use the local checkpoint's original\n",
" framework or what is available in the environment.\n",
" --trust-remote-code Allows to use custom code for the modeling hosted in\n",
" the model repository. This option should only be set\n",
" for repositories you trust and in which you have read\n",
" the code, as it will execute on your local machine\n",
" arbitrary code present in the model repository.\n",
" --pad-token-id PAD_TOKEN_ID\n",
" This is needed by some models, for some tasks. If not\n",
" provided, will attempt to use the tokenizer to guess\n",
" it.\n",
" --fp16 Compress weights to fp16\n",
" --int8 Compress weights to int8\n"
]
}
],
"source": [
"!optimum-cli export openvino --help"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"The command line export for model from example above with FP16 weights compression:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
"To disable this warning, you can either:\n",
"\t- Avoid using `tokenizers` before the fork if possible\n",
"\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Framework not specified. Using pt to export to ONNX.\n",
"Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']\n",
"- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
"Using the export variant default. Available variants are:\n",
" - default: The default ONNX variant.\n",
"Using framework PyTorch: 2.1.0+cpu\n",
"Overriding 1 configuration item(s)\n",
"\t- use_cache -> False\n"
]
}
],
"source": [
"!optimum-cli export openvino --model $MODEL --task text-classification --fp16 models/optimum_model/fp16"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"After export, model will be available in the specified directory and can be loaded using the same OVModelForXXX class."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Compiling the model to AUTO ...\n",
"Setting OpenVINO CACHE_DIR to models/optimum_model/fp16/model_cache\n"
]
}
],
"source": [
"model = OVModelForSequenceClassification.from_pretrained(\"models/optimum_model/fp16\", device=device.value)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"There are some models in the Hugging Face Models Hub, that are already converted and ready to run! You can filter those models out by library name, just type OpenVINO, or follow [this link](https://huggingface.co/models?library=openvino&sort=trending)."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### The Optimum Model Inference\n",
"[back to top ⬆️](#Table-of-contents:)\n",
"\n",
"Model inference is exactly the same as for the original model!"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1) positive 0.9483\n",
"2) neutral 0.0485\n",
"3) negative 0.0031\n"
]
}
],
"source": [
"output = model(**encoded_input)\n",
"scores = output[0][0]\n",
"scores = torch.softmax(scores, dim=0).numpy(force=True)\n",
"\n",
"print_prediction(scores)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"You can find more examples of using Optimum Intel here:\n",
"1. [Accelerate Inference of Sparse Transformer Models](../sparsity-optimization/sparsity-optimization.ipynb)\n",
"2. [Grammatical Error Correction with OpenVINO](../grammar-correction/grammar-correction.ipynb)\n",
"3. [Stable Diffusion v2.1 using Optimum-Intel OpenVINO](../stable-diffusion-v2/stable-diffusion-v2-optimum-demo.ipynb)\n",
"4. [Image generation with Stable Diffusion XL](../stable-diffusion-xl)\n",
"5. [Instruction following using Databricks Dolly 2.0](../dolly-2-instruction-following)\n",
"6. [Create LLM-powered Chatbot using OpenVINO](../llm-chatbot)\n",
"7. [Document Visual Question Answering Using Pix2Struct and OpenVINO](../pix2struct-docvqa)\n",
"8. [Automatic speech recognition using Distil-Whisper and OpenVINO](../distil-whisper-asr)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
},
"openvino_notebooks": {
"imageUrl": "",
"tags": {
"categories": [
"API Overview"
],
"libraries": [],
"other": [],
"tasks": [
"Text Classification"
]
}
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"state": {},
"version_major": 2,
"version_minor": 0
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}
|