tlwu
/

sdxl-turbo-onnxruntime

stable-diffusion

Model card Files Files and versions Community

tlwu commited on Dec 12, 2023

Commit

8202d1d

·

1 Parent(s): 9b4394a

update doc

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -48,7 +48,7 @@ Below is average latency of generating an image of size 512x512 using NVIDIA A10
 Static means the engine is built for the given batch size and image size combination, and CUDA graph is used to speed up.
 #### Latency for SDXL-Turbo with Canny Control Net
@@ -56,10 +56,10 @@ Below is average latency of generating an image of size 512x512 with canny contr
 | Engine      | Batch Size | Steps | PyTorch 2.1     | ONNX Runtime CUDA |
 |-------------|------------|------ | ----------------|-------------------|
-| Static      | 1          |   1   | 160.0 ms        |  49.3 ms          |
-| Static      | 4          |   1   | 314.9 ms        | 135.3 ms          |
-| Static      | 1          |   4   | 251.9 ms        | 123.3 ms          |
-| Static      | 4          |   4   | 514.2 ms        | 303.3 ms          |
 ## Usage Example

 Static means the engine is built for the given batch size and image size combination, and CUDA graph is used to speed up.
+For PyTorch 2.1, the UNet use channel last (NHWC) format, and compile the UNet with mode `reduce-overhead`. See [benchmark script](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/stable_diffusion/benchmark_controlnet.py) for detail.
 #### Latency for SDXL-Turbo with Canny Control Net
 | Engine      | Batch Size | Steps | PyTorch 2.1     | ONNX Runtime CUDA |
 |-------------|------------|------ | ----------------|-------------------|
+| Static      | 1          |   1   | 160.0 ms        |  55.3 ms          |
+| Static      | 4          |   1   | 314.9 ms        | 144.4 ms          |
+| Static      | 1          |   4   | 251.9 ms        | 134.9 ms          |
+| Static      | 4          |   4   | 514.2 ms        | 332.6 ms          |
 ## Usage Example