{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "37b660e4-3cba-412b-ab1a-b4acb88ff329", "metadata": {}, "source": [ "# Image generation with Latent Consistency Model and OpenVINO\n", "\n", "LCMs: The next generation of generative models after Latent Diffusion Models (LDMs). \n", "Latent Diffusion models (LDMs) have achieved remarkable results in synthesizing high-resolution images. However, the iterative sampling is computationally intensive and leads to slow generation.\n", "\n", "Inspired by [Consistency Models](https://arxiv.org/abs/2303.01469), [Latent Consistency Models](https://arxiv.org/pdf/2310.04378.pdf) (LCMs) were proposed, enabling swift inference with minimal steps on any pre-trained LDMs, including Stable Diffusion. \n", "The [Consistency Model (CM) (Song et al., 2023)](https://arxiv.org/abs/2303.01469) is a new family of generative models that enables one-step or few-step generation. The core idea of the CM is to learn the function that maps any points on a trajectory of the PF-ODE (probability flow of [ordinary differential equation](https://en.wikipedia.org/wiki/Ordinary_differential_equation)) to that trajectory’s origin (i.e., the solution of the PF-ODE). By learning consistency mappings that maintain point consistency on ODE-trajectory, these models allow for single-step generation, eliminating the need for computation-intensive iterations. However, CM is constrained to pixel space image generation tasks, making it unsuitable for synthesizing high-resolution images. LCMs adopt a consistency model in the image latent space for generation high-resolution images. Viewing the guided reverse diffusion process as solving an augmented probability flow ODE (PF-ODE), LCMs are designed to directly predict the solution of such ODE in latent space, mitigating the need for numerous iterations and allowing rapid, high-fidelity sampling. Utilizing image latent space in large-scale diffusion models like Stable Diffusion (SD) has effectively enhanced image generation quality and reduced computational load. The authors of LCMs provide a simple and efficient one-stage guided consistency distillation method named Latent Consistency Distillation (LCD) to distill SD for few-step (2∼4) or even 1-step sampling and propose the SKIPPING-STEP technique to further accelerate the convergence. More details about proposed approach and models can be found in [project page](https://latent-consistency-models.github.io/), [paper](https://arxiv.org/abs/2310.04378) and [original repository](https://github.com/luosiallen/latent-consistency-model).\n", "\n", "In this tutorial, we consider how to convert and run LCM using OpenVINO. An additional part demonstrates how to run quantization with [NNCF](https://github.com/openvinotoolkit/nncf/) to speed up pipeline.\n", "\n", "\n", "#### Table of contents:\n", "\n", "- [Prerequisites](#Prerequisites)\n", "- [Prepare models for OpenVINO format conversion](#Prepare-models-for-OpenVINO-format-conversion)\n", "- [Convert models to OpenVINO format](#Convert-models-to-OpenVINO-format)\n", " - [Text Encoder](#Text-Encoder)\n", " - [U-Net](#U-Net)\n", " - [VAE](#VAE)\n", "- [Prepare inference pipeline](#Prepare-inference-pipeline)\n", " - [Configure Inference Pipeline](#Configure-Inference-Pipeline)\n", "- [Text-to-image generation](#Text-to-image-generation)\n", "- [Quantization](#Quantization)\n", " - [Prepare calibration dataset](#Prepare-calibration-dataset)\n", " - [Run quantization](#Run-quantization)\n", " - [Compare inference time of the FP16 and INT8 models](#Compare-inference-time-of-the-FP16-and-INT8-models)\n", " - [Compare UNet file size](#Compare-UNet-file-size)\n", "- [Interactive demo](#Interactive-demo)\n", "\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "ee66539d-99cc-45a3-80e4-4fbd6b520650", "metadata": {}, "source": [ "## Prerequisites\n", "[back to top ⬆️](#Table-of-contents:)\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "0df6bc0c-fa5b-478b-ad44-b2f711497754", "metadata": {}, "outputs": [], "source": [ "%pip install -q \"torch>=2.1\" --index-url https://download.pytorch.org/whl/cpu\n", "%pip install -q \"openvino>=2023.1.0\" transformers \"diffusers>=0.23.1\" pillow \"gradio>=4.19\" \"nncf>=2.7.0\" \"datasets>=2.14.6\" \"peft==0.6.2\" --extra-index-url https://download.pytorch.org/whl/cpu" ] }, { "attachments": {}, "cell_type": "markdown", "id": "9523be62-2c38-480d-ae53-13910ae6d049", "metadata": {}, "source": [ "## Prepare models for OpenVINO format conversion\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "In this tutorial we will use [LCM_Dreamshaper_v7](https://huggingface.co/SimianLuo/LCM_Dreamshaper_v7) from [HuggingFace hub](https://huggingface.co/). This model distilled from [Dreamshaper v7](https://huggingface.co/Lykon/dreamshaper-7) fine-tune of [Stable-Diffusion v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) using Latent Consistency Distillation (LCD) approach discussed above. This model is also integrated into [Diffusers](https://huggingface.co/docs/diffusers/index) library. 🤗 Diffusers is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. This allows us to compare running original Stable Diffusion (from this [notebook](../stable-diffusion-text-to-image/stable-diffusion-text-to-image.ipynb)) and distilled using LCD. The distillation approach efficiently converts a pre-trained guided diffusion model into a latent consistency model by solving an augmented PF-ODE.\n", "\n", "For starting work with LCM, we should instantiate generation pipeline first. `DiffusionPipeline.from_pretrained` method download all pipeline components for LCM and configure them. This model uses custom inference pipeline stored as part of model repository, we also should provide which module should be loaded for initialization using `custom_pipeline` argument and revision for it." ] }, { "cell_type": "code", "execution_count": 2, "id": "d143e91b-81b1-4ed0-96dd-63e2f639cf53", "metadata": {}, "outputs": [], "source": [ "import gc\n", "import warnings\n", "from pathlib import Path\n", "from diffusers import DiffusionPipeline\n", "import numpy as np\n", "\n", "\n", "warnings.filterwarnings(\"ignore\")\n", "\n", "TEXT_ENCODER_OV_PATH = Path(\"model/text_encoder.xml\")\n", "UNET_OV_PATH = Path(\"model/unet.xml\")\n", "VAE_DECODER_OV_PATH = Path(\"model/vae_decoder.xml\")\n", "\n", "\n", "def load_orginal_pytorch_pipeline_componets(skip_models=False, skip_safety_checker=False):\n", " pipe = DiffusionPipeline.from_pretrained(\"SimianLuo/LCM_Dreamshaper_v7\")\n", " scheduler = pipe.scheduler\n", " tokenizer = pipe.tokenizer\n", " feature_extractor = pipe.feature_extractor if not skip_safety_checker else None\n", " safety_checker = pipe.safety_checker if not skip_safety_checker else None\n", " text_encoder, unet, vae = None, None, None\n", " if not skip_models:\n", " text_encoder = pipe.text_encoder\n", " text_encoder.eval()\n", " unet = pipe.unet\n", " unet.eval()\n", " vae = pipe.vae\n", " vae.eval()\n", " del pipe\n", " gc.collect()\n", " return (\n", " scheduler,\n", " tokenizer,\n", " feature_extractor,\n", " safety_checker,\n", " text_encoder,\n", " unet,\n", " vae,\n", " )" ] }, { "cell_type": "code", "execution_count": 3, "id": "a6bc3fa8-fb23-4a89-94de-02d1ea7f40a7", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "438e230216a04db4836360a95380e65e", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Fetching 15 files: 0%| | 0/15 [00:00 0.0\n", " # In LCM Implementation: cfg_noise = noise_cond + cfg_scale * (noise_cond - noise_uncond) , (cfg_scale > 0.0 using CFG)\n", "\n", " # 2. Encode input prompt\n", " prompt_embeds = self._encode_prompt(\n", " prompt,\n", " num_images_per_prompt,\n", " prompt_embeds=prompt_embeds,\n", " )\n", "\n", " # 3. Prepare timesteps\n", " self.scheduler.set_timesteps(num_inference_steps, original_inference_steps=lcm_origin_steps)\n", " timesteps = self.scheduler.timesteps\n", "\n", " # 4. Prepare latent variable\n", " num_channels_latents = 4\n", " latents = self.prepare_latents(\n", " batch_size * num_images_per_prompt,\n", " num_channels_latents,\n", " height,\n", " width,\n", " prompt_embeds.dtype,\n", " latents,\n", " )\n", "\n", " bs = batch_size * num_images_per_prompt\n", "\n", " # 5. Get Guidance Scale Embedding\n", " w = torch.tensor(guidance_scale).repeat(bs)\n", " w_embedding = self.get_w_embedding(w, embedding_dim=256)\n", "\n", " # 6. LCM MultiStep Sampling Loop:\n", " with self.progress_bar(total=num_inference_steps) as progress_bar:\n", " for i, t in enumerate(timesteps):\n", " ts = torch.full((bs,), t, dtype=torch.long)\n", "\n", " # model prediction (v-prediction, eps, x)\n", " model_pred = self.unet(\n", " [latents, ts, prompt_embeds, w_embedding],\n", " share_inputs=True,\n", " share_outputs=True,\n", " )[0]\n", "\n", " # compute the previous noisy sample x_t -> x_t-1\n", " latents, denoised = self.scheduler.step(torch.from_numpy(model_pred), t, latents, return_dict=False)\n", " progress_bar.update()\n", "\n", " if not output_type == \"latent\":\n", " image = torch.from_numpy(self.vae_decoder(denoised / 0.18215, share_inputs=True, share_outputs=True)[0])\n", " image, has_nsfw_concept = self.run_safety_checker(image, prompt_embeds.dtype)\n", " else:\n", " image = denoised\n", " has_nsfw_concept = None\n", "\n", " if has_nsfw_concept is None:\n", " do_denormalize = [True] * image.shape[0]\n", " else:\n", " do_denormalize = [not has_nsfw for has_nsfw in has_nsfw_concept]\n", "\n", " image = self.image_processor.postprocess(image, output_type=output_type, do_denormalize=do_denormalize)\n", "\n", " if not return_dict:\n", " return (image, has_nsfw_concept)\n", "\n", " return StableDiffusionPipelineOutput(images=image, nsfw_content_detected=has_nsfw_concept)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "8de4233d-f6df-44a5-a319-af24fd2d6ee1", "metadata": {}, "source": [ "### Configure Inference Pipeline\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "First, you should create instances of OpenVINO Model and compile it using selected device. Select device from dropdown list for running inference using OpenVINO." ] }, { "cell_type": "code", "execution_count": 8, "id": "bc5c1b46-d95e-4251-8435-9c7bc83264c9", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "3540ef6268a24abba2f928f4412c7b35", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Dropdown(description='Device:', options=('CPU', 'AUTO'), value='CPU')" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "core = ov.Core()\n", "\n", "import ipywidgets as widgets\n", "\n", "device = widgets.Dropdown(\n", " options=core.available_devices + [\"AUTO\"],\n", " value=\"CPU\",\n", " description=\"Device:\",\n", " disabled=False,\n", ")\n", "\n", "device" ] }, { "cell_type": "code", "execution_count": 9, "id": "49c0c5b5-835b-4e96-bb74-86243f531648", "metadata": {}, "outputs": [], "source": [ "text_enc = core.compile_model(TEXT_ENCODER_OV_PATH, device.value)\n", "unet_model = core.compile_model(UNET_OV_PATH, device.value)\n", "\n", "ov_config = {\"INFERENCE_PRECISION_HINT\": \"f32\"} if device.value != \"CPU\" else {}\n", "\n", "vae_decoder = core.compile_model(VAE_DECODER_OV_PATH, device.value, ov_config)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "8ca4d4ac-66e0-4636-bb26-d3e66eb90639", "metadata": {}, "source": [ "Model tokenizer and scheduler are also important parts of the pipeline. This pipeline is also can use Safety Checker, the filter for detecting that corresponding generated image contains \"not-safe-for-work\" (nsfw) content. The process of nsfw content detection requires to obtain image embeddings using CLIP model, so additionally feature extractor component should be added in the pipeline. We reuse tokenizer, feature extractor, scheduler and safety checker from original LCM pipeline." ] }, { "cell_type": "code", "execution_count": 10, "id": "3dce8a15-163b-4b13-a575-fcb44fd038bc", "metadata": {}, "outputs": [], "source": [ "ov_pipe = OVLatentConsistencyModelPipeline(\n", " tokenizer=tokenizer,\n", " text_encoder=text_enc,\n", " unet=unet_model,\n", " vae_decoder=vae_decoder,\n", " scheduler=scheduler,\n", " feature_extractor=feature_extractor,\n", " safety_checker=safety_checker,\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "21bc3f1b-c324-4f8d-bcf9-19504f5826f5", "metadata": {}, "source": [ "## Text-to-image generation\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "Now, let's see model in action" ] }, { "cell_type": "code", "execution_count": 11, "id": "a3fbba15-5199-4d2a-8898-eb0f8f705289", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "b6296c4fa8bf4deca42ce48aa929bf0c", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/4 [00:00" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "images[0]" ] }, { "attachments": {}, "cell_type": "markdown", "id": "7600eaf2-6a6a-4f07-8fdc-fa7155acc842", "metadata": {}, "source": [ "Nice. As you can see, the picture has quite a high definition 🔥." ] }, { "attachments": {}, "cell_type": "markdown", "id": "919bf570", "metadata": {}, "source": [ "## Quantization\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "[NNCF](https://github.com/openvinotoolkit/nncf/) enables post-training quantization by adding quantization layers into model graph and then using a subset of the training dataset to initialize the parameters of these additional quantization layers. Quantized operations are executed in `INT8` instead of `FP32`/`FP16` making model inference faster.\n", "\n", "According to `LatentConsistencyModelPipeline` structure, UNet used for iterative denoising of input. It means that model runs in the cycle repeating inference on each diffusion step, while other parts of pipeline take part only once. That is why computation cost and speed of UNet denoising becomes the critical path in the pipeline. Quantizing the rest of the SD pipeline does not significantly improve inference performance but can lead to a substantial degradation of accuracy.\n", "\n", "The optimization process contains the following steps:\n", "\n", "1. Create a calibration dataset for quantization.\n", "2. Run `nncf.quantize()` to obtain quantized model.\n", "3. Save the `INT8` model using `openvino.save_model()` function.\n", "\n", "Please select below whether you would like to run quantization to improve model inference speed." ] }, { "cell_type": "code", "execution_count": 13, "id": "dad95e35", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "0178a802f323482986ccbfc369ac0890", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Checkbox(value=True, description='Quantization')" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "skip_for_device = \"GPU\" in device.value\n", "to_quantize = widgets.Checkbox(value=not skip_for_device, description=\"Quantization\", disabled=skip_for_device)\n", "to_quantize" ] }, { "attachments": {}, "cell_type": "markdown", "id": "7383b987", "metadata": {}, "source": [ "Let's load `skip magic` extension to skip quantization if `to_quantize` is not selected" ] }, { "cell_type": "code", "execution_count": 14, "id": "1e42ceca", "metadata": {}, "outputs": [], "source": [ "int8_pipe = None\n", "\n", "# Fetch `skip_kernel_extension` module\n", "import requests\n", "\n", "r = requests.get(\n", " url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/skip_kernel_extension.py\",\n", ")\n", "open(\"skip_kernel_extension.py\", \"w\").write(r.text)\n", "%load_ext skip_kernel_extension" ] }, { "attachments": {}, "cell_type": "markdown", "id": "2931a018", "metadata": {}, "source": [ "### Prepare calibration dataset\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "We use a portion of [conceptual_captions](https://huggingface.co/datasets/conceptual_captions) dataset from Hugging Face as calibration data.\n", "To collect intermediate model inputs for calibration we should customize `CompiledModel`." ] }, { "cell_type": "code", "execution_count": 15, "id": "8951ec45", "metadata": {}, "outputs": [], "source": [ "%%skip not $to_quantize.value\n", "\n", "import datasets\n", "from tqdm.notebook import tqdm\n", "from transformers import set_seed\n", "from typing import Any, Dict, List\n", "\n", "set_seed(1)\n", "\n", "class CompiledModelDecorator(ov.CompiledModel):\n", " def __init__(self, compiled_model, prob: float, data_cache: List[Any] = None):\n", " super().__init__(compiled_model)\n", " self.data_cache = data_cache if data_cache else []\n", " self.prob = np.clip(prob, 0, 1)\n", "\n", " def __call__(self, *args, **kwargs):\n", " if np.random.rand() >= self.prob:\n", " self.data_cache.append(*args)\n", " return super().__call__(*args, **kwargs)\n", "\n", "def collect_calibration_data(lcm_pipeline: OVLatentConsistencyModelPipeline, subset_size: int) -> List[Dict]:\n", " original_unet = lcm_pipeline.unet\n", " lcm_pipeline.unet = CompiledModelDecorator(original_unet, prob=0.3)\n", "\n", " dataset = datasets.load_dataset(\"conceptual_captions\", split=\"train\").shuffle(seed=42)\n", " lcm_pipeline.set_progress_bar_config(disable=True)\n", " safety_checker = lcm_pipeline.safety_checker\n", " lcm_pipeline.safety_checker = None\n", "\n", " # Run inference for data collection\n", " pbar = tqdm(total=subset_size)\n", " diff = 0\n", " for batch in dataset:\n", " prompt = batch[\"caption\"]\n", " if len(prompt) > tokenizer.model_max_length:\n", " continue\n", " _ = lcm_pipeline(\n", " prompt,\n", " num_inference_steps=num_inference_steps,\n", " guidance_scale=8.0,\n", " lcm_origin_steps=50,\n", " output_type=\"pil\",\n", " height=512,\n", " width=512,\n", " )\n", " collected_subset_size = len(lcm_pipeline.unet.data_cache)\n", " if collected_subset_size >= subset_size:\n", " pbar.update(subset_size - pbar.n)\n", " break\n", " pbar.update(collected_subset_size - diff)\n", " diff = collected_subset_size\n", "\n", " calibration_dataset = lcm_pipeline.unet.data_cache\n", " lcm_pipeline.set_progress_bar_config(disable=False)\n", " lcm_pipeline.unet = original_unet\n", " lcm_pipeline.safety_checker = safety_checker\n", " return calibration_dataset" ] }, { "cell_type": "code", "execution_count": 16, "id": "8043270d", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "f24272fe24534bb3823a7c2f7e1899fb", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/200 [00:00 **NOTE**: Quantization is time and memory consuming operation. Running quantization code below may take some time." ] }, { "cell_type": "code", "execution_count": 17, "id": "d37de4ff", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, onnx, openvino\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "4d6bee83bed5496fadba07472958b3f3", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Output()" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "
\n",
       "
\n" ], "text/plain": [ "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "0d785a82c367493d9712ff30b8b19b70", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Output()" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "
\n",
       "
\n" ], "text/plain": [ "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "INFO:nncf:122 ignored nodes were found by name in the NNCFGraph\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "6b38b127fafc430ea92e16f4d68af790", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Output()" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "
\n",
       "
\n" ], "text/plain": [ "\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%skip not $to_quantize.value\n", "\n", "import nncf\n", "from nncf.scopes import IgnoredScope\n", "\n", "if UNET_INT8_OV_PATH.exists():\n", " print(\"Loading quantized model\")\n", " quantized_unet = core.read_model(UNET_INT8_OV_PATH)\n", "else:\n", " unet = core.read_model(UNET_OV_PATH)\n", " quantized_unet = nncf.quantize(\n", " model=unet,\n", " subset_size=subset_size,\n", " calibration_dataset=nncf.Dataset(unet_calibration_data),\n", " model_type=nncf.ModelType.TRANSFORMER,\n", " advanced_parameters=nncf.AdvancedQuantizationParameters(\n", " disable_bias_correction=True\n", " )\n", " )\n", " ov.save_model(quantized_unet, UNET_INT8_OV_PATH)" ] }, { "cell_type": "code", "execution_count": 18, "id": "f0e10903", "metadata": {}, "outputs": [], "source": [ "%%skip not $to_quantize.value\n", "\n", "unet_optimized = core.compile_model(UNET_INT8_OV_PATH, device.value)\n", "\n", "int8_pipe = OVLatentConsistencyModelPipeline(\n", " tokenizer=tokenizer,\n", " text_encoder=text_enc,\n", " unet=unet_optimized,\n", " vae_decoder=vae_decoder,\n", " scheduler=scheduler,\n", " feature_extractor=feature_extractor,\n", " safety_checker=safety_checker,\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "d90608e5", "metadata": {}, "source": [ "Let us check predictions with the quantized UNet using the same input data." ] }, { "cell_type": "code", "execution_count": 19, "id": "44e34a85", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "079112304f7c467e832cf115a3fd4e4a", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/4 [00:00" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%skip not $to_quantize.value\n", "\n", "from IPython.display import display\n", "\n", "prompt = \"a beautiful pink unicorn, 8k\"\n", "num_inference_steps = 4\n", "torch.manual_seed(1234567)\n", "\n", "images = int8_pipe(\n", " prompt=prompt,\n", " num_inference_steps=num_inference_steps,\n", " guidance_scale=8.0,\n", " lcm_origin_steps=50,\n", " output_type=\"pil\",\n", " height=512,\n", " width=512,\n", ").images\n", "\n", "display(images[0])" ] }, { "attachments": {}, "cell_type": "markdown", "id": "3ea2cfd0", "metadata": {}, "source": [ "### Compare inference time of the FP16 and INT8 models\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "To measure the inference performance of the `FP16` and `INT8` pipelines, we use median inference time on calibration subset.\n", "\n", "> **NOTE**: For the most accurate performance estimation, it is recommended to run `benchmark_app` in a terminal/command prompt after closing other applications." ] }, { "cell_type": "code", "execution_count": 20, "id": "cc2dcd2d", "metadata": {}, "outputs": [], "source": [ "%%skip not $to_quantize.value\n", "\n", "import time\n", "\n", "validation_size = 10\n", "calibration_dataset = datasets.load_dataset(\"conceptual_captions\", split=\"train\")\n", "validation_data = []\n", "for idx, batch in enumerate(calibration_dataset):\n", " if idx >= validation_size:\n", " break\n", " prompt = batch[\"caption\"]\n", " validation_data.append(prompt)\n", "\n", "def calculate_inference_time(pipeline, calibration_dataset):\n", " inference_time = []\n", " pipeline.set_progress_bar_config(disable=True)\n", " for idx, prompt in enumerate(validation_data):\n", " start = time.perf_counter()\n", " _ = pipeline(\n", " prompt,\n", " num_inference_steps=num_inference_steps,\n", " guidance_scale=8.0,\n", " lcm_origin_steps=50,\n", " output_type=\"pil\",\n", " height=512,\n", " width=512,\n", " )\n", " end = time.perf_counter()\n", " delta = end - start\n", " inference_time.append(delta)\n", " if idx >= validation_size:\n", " break\n", " return np.median(inference_time)" ] }, { "cell_type": "code", "execution_count": 21, "id": "7db35eb9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Performance speed up: 1.319\n" ] } ], "source": [ "%%skip not $to_quantize.value\n", "\n", "fp_latency = calculate_inference_time(ov_pipe, validation_data)\n", "int8_latency = calculate_inference_time(int8_pipe, validation_data)\n", "print(f\"Performance speed up: {fp_latency / int8_latency:.3f}\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "1cfcd78e", "metadata": {}, "source": [ "#### Compare UNet file size\n", "[back to top ⬆️](#Table-of-contents:)\n" ] }, { "cell_type": "code", "execution_count": 22, "id": "cf37b81b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "FP16 model size: 1678912.37 KB\n", "INT8 model size: 840792.93 KB\n", "Model compression rate: 1.997\n" ] } ], "source": [ "%%skip not $to_quantize.value\n", "\n", "fp16_ir_model_size = UNET_OV_PATH.with_suffix(\".bin\").stat().st_size / 1024\n", "quantized_model_size = UNET_INT8_OV_PATH.with_suffix(\".bin\").stat().st_size / 1024\n", "\n", "print(f\"FP16 model size: {fp16_ir_model_size:.2f} KB\")\n", "print(f\"INT8 model size: {quantized_model_size:.2f} KB\")\n", "print(f\"Model compression rate: {fp16_ir_model_size / quantized_model_size:.3f}\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "035dffa6-0ce9-4d22-8638-a2217d06e828", "metadata": {}, "source": [ "## Interactive demo\n", "[back to top ⬆️](#Table-of-contents:)\n" ] }, { "cell_type": "code", "execution_count": 23, "id": "3696ec23-8959-4087-a4a7-48f2b7fb0869", "metadata": {}, "outputs": [], "source": [ "import random\n", "import gradio as gr\n", "from functools import partial\n", "\n", "MAX_SEED = np.iinfo(np.int32).max\n", "\n", "examples = [\n", " \"portrait photo of a girl, photograph, highly detailed face, depth of field, moody light, golden hour,\"\n", " \"style by Dan Winters, Russell James, Steve McCurry, centered, extremely detailed, Nikon D850, award winning photography\",\n", " \"Self-portrait oil painting, a beautiful cyborg with golden hair, 8k\",\n", " \"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k\",\n", " \"A photo of beautiful mountain with realistic sunset and blue lake, highly detailed, masterpiece\",\n", "]\n", "\n", "\n", "def randomize_seed_fn(seed: int, randomize_seed: bool) -> int:\n", " if randomize_seed:\n", " seed = random.randint(0, MAX_SEED)\n", " return seed\n", "\n", "\n", "MAX_IMAGE_SIZE = 768\n", "\n", "\n", "def generate(\n", " pipeline: OVLatentConsistencyModelPipeline,\n", " prompt: str,\n", " seed: int = 0,\n", " width: int = 512,\n", " height: int = 512,\n", " guidance_scale: float = 8.0,\n", " num_inference_steps: int = 4,\n", " randomize_seed: bool = False,\n", " num_images: int = 1,\n", " progress=gr.Progress(track_tqdm=True),\n", "):\n", " seed = randomize_seed_fn(seed, randomize_seed)\n", " torch.manual_seed(seed)\n", " result = pipeline(\n", " prompt=prompt,\n", " width=width,\n", " height=height,\n", " guidance_scale=guidance_scale,\n", " num_inference_steps=num_inference_steps,\n", " num_images_per_prompt=num_images,\n", " lcm_origin_steps=50,\n", " output_type=\"pil\",\n", " ).images[0]\n", " return result, seed\n", "\n", "\n", "generate_original = partial(generate, ov_pipe)\n", "generate_optimized = partial(generate, int8_pipe)\n", "quantized_model_present = int8_pipe is not None\n", "\n", "with gr.Blocks() as demo:\n", " with gr.Group():\n", " with gr.Row():\n", " prompt = gr.Text(\n", " label=\"Prompt\",\n", " show_label=False,\n", " max_lines=1,\n", " placeholder=\"Enter your prompt\",\n", " container=False,\n", " )\n", " with gr.Row():\n", " with gr.Column():\n", " result = gr.Image(\n", " label=\"Result (Original)\" if quantized_model_present else \"Image\",\n", " type=\"pil\",\n", " )\n", " run_button = gr.Button(\"Run\")\n", " with gr.Column(visible=quantized_model_present):\n", " result_optimized = gr.Image(\n", " label=\"Result (Optimized)\",\n", " type=\"pil\",\n", " visible=quantized_model_present,\n", " )\n", " run_quantized_button = gr.Button(value=\"Run quantized\", visible=quantized_model_present)\n", "\n", " with gr.Accordion(\"Advanced options\", open=False):\n", " seed = gr.Slider(label=\"Seed\", minimum=0, maximum=MAX_SEED, step=1, value=0, randomize=True)\n", " randomize_seed = gr.Checkbox(label=\"Randomize seed across runs\", value=True)\n", " with gr.Row():\n", " width = gr.Slider(\n", " label=\"Width\",\n", " minimum=256,\n", " maximum=MAX_IMAGE_SIZE,\n", " step=32,\n", " value=512,\n", " )\n", " height = gr.Slider(\n", " label=\"Height\",\n", " minimum=256,\n", " maximum=MAX_IMAGE_SIZE,\n", " step=32,\n", " value=512,\n", " )\n", " with gr.Row():\n", " guidance_scale = gr.Slider(\n", " label=\"Guidance scale for base\",\n", " minimum=2,\n", " maximum=14,\n", " step=0.1,\n", " value=8.0,\n", " )\n", " num_inference_steps = gr.Slider(\n", " label=\"Number of inference steps for base\",\n", " minimum=1,\n", " maximum=8,\n", " step=1,\n", " value=4,\n", " )\n", "\n", " gr.Examples(\n", " examples=examples,\n", " inputs=prompt,\n", " outputs=result,\n", " cache_examples=False,\n", " )\n", "\n", " gr.on(\n", " triggers=[\n", " prompt.submit,\n", " run_button.click,\n", " ],\n", " fn=generate_original,\n", " inputs=[\n", " prompt,\n", " seed,\n", " width,\n", " height,\n", " guidance_scale,\n", " num_inference_steps,\n", " randomize_seed,\n", " ],\n", " outputs=[result, seed],\n", " )\n", "\n", " if quantized_model_present:\n", " gr.on(\n", " triggers=[\n", " prompt.submit,\n", " run_quantized_button.click,\n", " ],\n", " fn=generate_optimized,\n", " inputs=[\n", " prompt,\n", " seed,\n", " width,\n", " height,\n", " guidance_scale,\n", " num_inference_steps,\n", " randomize_seed,\n", " ],\n", " outputs=[result_optimized, seed],\n", " )" ] }, { "cell_type": "code", "execution_count": null, "id": "5edce720", "metadata": {}, "outputs": [], "source": [ "try:\n", " demo.queue().launch(debug=True)\n", "except Exception:\n", " demo.queue().launch(share=True, debug=True)\n", "# if you are launching remotely, specify server_name and server_port\n", "# demo.launch(server_name='your server name', server_port='server port in int')\n", "# Read more in the docs: https://gradio.app/docs/" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "openvino_notebooks": { "imageUrl": "https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/latent-consistency-models-image-generation/latent-consistency-models-image-generation.png?raw=true", "tags": { "categories": [ "Model Demos", "AI Trends" ], "libraries": [], "other": [], "tasks": [ "Text-to-Image" ] } }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }