roll-ai commited on
Commit
8e19ca1
·
verified ·
1 Parent(s): 49f0d37

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -257
README.md CHANGED
@@ -1,260 +1,13 @@
1
- # DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution
2
-
3
- [Zheng Chen](https://zhengchen1999.github.io/), [Zichen Zou](https://github.com/zzctmd), [Kewei Zhang](), [Xiongfei Su](https://ieeexplore.ieee.org/author/37086348852), [Xin Yuan](https://en.westlake.edu.cn/faculty/xin-yuan.html), [Yong Guo](https://www.guoyongcs.com/), and [Yulun Zhang](http://yulunzhang.com/), "DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution", 2025
4
-
5
- <div>
6
- <a href="https://github.com/zhengchen1999/DOVE/releases" target='_blank' style="text-decoration: none;"><img src="https://img.shields.io/github/downloads/zhengchen1999/DOVE/total?color=green&style=flat"></a>
7
- <a href="https://github.com/zhengchen1999/DOVE" target='_blank' style="text-decoration: none;"><img src="https://visitor-badge.laobi.icu/badge?page_id=zhengchen1999/DOVE"></a>
8
- <a href="https://github.com/zhengchen1999/DOVE/stargazers" target='_blank' style="text-decoration: none;"><img src="https://img.shields.io/github/stars/zhengchen1999/DOVE?style=social"></a>
9
- </div>
10
-
11
-
12
- [[arXiv](https://arxiv.org/abs/2505.16239)] [[supplementary material](https://github.com/zhengchen1999/DOVE/releases/download/v1/Supplementary_Material.pdf)] [[dataset](https://drive.google.com/drive/folders/1e7CyNzfJBa2saWvPr2HI2q_FJhLIc-Ww?usp=drive_link)] [[pretrained models](https://drive.google.com/drive/folders/1wj9jY0fn6prSWJ7BjJOXfxC0bs8skKbQ?usp=sharing)]
13
-
14
-
15
-
16
- #### 🔥🔥🔥 News
17
-
18
- - **2025-6-09:** Test datasets, inference scripts, and pretrained models are available. ⭐️⭐️⭐️
19
- - **2025-5-22:** This repo is released.
20
-
21
  ---
22
-
23
- > **Abstract:** Diffusion models have demonstrated promising performance in real-world video super-resolution (VSR). However, the dozens of sampling steps they require, make inference extremely slow. Sampling acceleration techniques, particularly single-step, provide a potential solution. Nonetheless, achieving one step in VSR remains challenging, due to the high training overhead on video data and stringent fidelity demands. To tackle the above issues, we propose DOVE, an efficient one-step diffusion model for real-world VSR. DOVE is obtained by fine-tuning a pretrained video diffusion model (*i.e.*, CogVideoX). To effectively train DOVE, we introduce the latent–pixel training strategy. The strategy employs a two-stage scheme to gradually adapt the model to the video super-resolution task.
24
- > Meanwhile, we design a video processing pipeline to construct a high-quality dataset tailored for VSR, termed HQ-VSR. Fine-tuning on this dataset further enhances the restoration capability of DOVE. Extensive experiments show that DOVE exhibits comparable or superior performance to multi-step diffusion-based VSR methods. It also offers outstanding inference efficiency, achieving up to a **28×** speed-up over existing methods such as MGLD-VSR.
25
-
26
- ![](./assets/Compare.png)
27
-
 
 
 
28
  ---
29
 
30
-
31
-
32
- <table border="0" style="width: 100%; text-align: center; margin-top: 20px;">
33
- <tr>
34
- <td>
35
- <video src="https://github.com/user-attachments/assets/4ad0ca78-6cca-48c0-95a5-5d5554093f7d" controls autoplay loop></video>
36
- </td>
37
- <td>
38
- <video src="https://github.com/user-attachments/assets/e5b5d247-28af-43fd-b32c-1f1b5896d9e7" controls autoplay loop></video>
39
- </td>
40
- </tr>
41
- </table>
42
-
43
-
44
-
45
-
46
- ---
47
-
48
- ### Training Strategy
49
-
50
- ![](./assets/Strategy.png)
51
-
52
- ---
53
-
54
- ### Video Processing Pipeline
55
-
56
- ![](./assets/Pipeline.png)
57
-
58
-
59
-
60
-
61
- ## 🔖 TODO
62
-
63
- - [x] Release testing code.
64
- - [x] Release pre-trained models.
65
- - [ ] Release training code.
66
- - [ ] Release video processing pipeline.
67
- - [ ] Release HQ-VSR dataset.
68
- - [ ] Provide WebUI.
69
- - [ ] Provide HuggingFace demo.
70
-
71
- ## ⚙️ Dependencies
72
-
73
- - Python 3.11
74
- - PyTorch\>=2.5.0
75
- - Diffusers
76
-
77
- ```bash
78
- # Clone the github repo and go to the default directory 'DOVE'.
79
- git clone https://github.com/zhengchen1999/DOVE.git
80
- conda create -n DOVE python=3.11
81
- conda activate DOVE
82
- pip install -r requirements.txt
83
- pip install diffusers["torch"] transformers
84
- pip install pyiqa
85
- ```
86
-
87
- ## 🔗 Contents
88
-
89
- 1. [Datasets](#datasets)
90
- 1. [Models](#models)
91
- 1. Training
92
- 1. [Testing](#testing)
93
- 1. [Results](#results)
94
- 1. [Acknowledgements](#acknowledgements)
95
-
96
- ## <a name="datasets"></a>📁 Datasets
97
-
98
- ### 🗳️ Test Datasets
99
-
100
- We provide several real-world and synthetic test datasets for evaluation. All datasets follow a consistent directory structure:
101
-
102
- | Dataset | Type | # Num | Download |
103
- | :------ | :--------: | :---: | :----------------------------------------------------------: |
104
- | UDM10 | Synthetic | 10 | [Google Drive](https://drive.google.com/file/d/1AmGVSCwMm_OFPd3DKgNyTwj0GG2H-tG4/view?usp=drive_link) |
105
- | SPMCS | Synthetic | 30 | [Google Drive](https://drive.google.com/file/d/1b2uktCFPKS-R1fTecWcLFcOnmUFIBNWT/view?usp=drive_link) |
106
- | YouHQ40 | Synthetic | 40 | [Google Drive](https://drive.google.com/file/d/1zO23UCStxL3htPJQcDUUnUeMvDrysLTh/view?usp=sharing) |
107
- | RealVSR | Real-world | 50 | [Google Drive](https://drive.google.com/file/d/1wr4tTiCvQlqdYPeU1dmnjb5KFY4VjGCO/view?usp=drive_link) |
108
- | MVSR4x | Real-world | 15 | [Google Drive](https://drive.google.com/file/d/16sesBD_9Xx_5Grtx18nosBw1w94KlpQt/view?usp=drive_link) |
109
- | VideoLQ | Real-world | 50 | [Google Drive](https://drive.google.com/file/d/1lh0vkU_llxE0un1OigJ0DWPQwt1i68Vn/view?usp=drive_link) |
110
-
111
- All datasets are hosted on [here](https://drive.google.com/drive/folders/1yNKG6rtTNtZQY8qL74GoQwA0jgjBUEby?usp=sharing). Make sure the path is correct (`datasets/test/`) before running inference.
112
-
113
- The directory structure is as follows:
114
-
115
- ```shell
116
- datasets/
117
- └── test/
118
- └── [DatasetName]/
119
- ├── GT/ # Ground Truth: folder of high-quality frames (one per clip)
120
- ├── GT-Video/ # Ground Truth (video version): lossless MKV format
121
- ├── LQ/ # Low-quality Input: folder of degraded frames (one per clip)
122
- └── LQ-Video/ # Low-Quality Input (video version): lossless MKV format
123
- ```
124
-
125
- ## <a name="models"></a>📦 Models
126
-
127
- We provide pretrained weights for DOVE and DOVE-2B.
128
-
129
- | Model Name | Description | HuggingFace | Google Drive | Visual Results |
130
- | :--------- | :-------------------------------------: | :---------: | :----------------------------------------------------------: | ------------------------------------------------------------ |
131
- | DOVE | Base version, built on CogVideoX1.5-5B; | TODO | [Download](https://drive.google.com/file/d/1Nl3XoJndMtpu6KPFcskUTkI0qWBiSXF2/view?usp=drive_link) | [Download](https://drive.google.com/drive/folders/1J92X1amVijH9dNWGQcz-6Cx44B7EipWr?usp=drive_link) |
132
- | DOVE-2B | Smaller version, based on CogVideoX-2B | TODO | TODO | TODO |
133
-
134
- > Place downloaded model files into the `pretrained_models/` folder, e.g., `pretrained_models/DOVE`.
135
-
136
- ## <a name="testing"></a>🔨 Testing
137
-
138
- We provide inference commands below. Before running, make sure to download the corresponding pretrained models and test datasets.
139
-
140
- For more options and usage, please refer to [inference_script.py](inference_script.py).
141
-
142
- The full testing commands are provided in the shell script: [inference.sh](inference.sh).
143
-
144
- ```shell
145
- # 🔹 Demo inference
146
- python inference_script.py \
147
- --input_dir datasets/demo \
148
- --model_path pretrained_models/DOVE \
149
- --output_path results/DOVE/demo \
150
- --is_vae_st \
151
- --save_format yuv420p
152
-
153
- # 🔹 Reproduce paper results
154
- python inference_script.py \
155
- --input_dir datasets/test/UDM10/LQ-Video \
156
- --model_path pretrained_models/DOVE \
157
- --output_path results/DOVE/UDM10 \
158
- --is_vae_st \
159
-
160
- # 🔹 Evaluate quantitative metrics
161
- python eval_metrics.py \
162
- --gt datasets/test/UDM10/GT \
163
- --pred results/DOVE/UDM10 \
164
- --metrics psnr,ssim,lpips,dists,clipiqa
165
- ```
166
-
167
- > 💡 If you encounter out-of-memory (OOM) issues, you can enable chunk-based testing by setting the following parameters: tile_size_hw, overlap_hw, chunk_len, and overlap_t.
168
- >
169
- > 💡 Default save format is `yuv444p`. If playback fails, try `save_format=yuv420p` (may slightly affect metrics).
170
- >
171
- > **TODO:** Add metric computation scripts for FasterVQA, DOVER, and $E^*_{warp}$.
172
-
173
- ## <a name="results"></a>🔎 Results
174
-
175
- We achieve state-of-the-art performance on real-world video super-resolution. Visual results are available at [Google Drive](https://drive.google.com/drive/folders/1J92X1amVijH9dNWGQcz-6Cx44B7EipWr?usp=drive_link).
176
-
177
- <details open>
178
- <summary>Quantitative Results (click to expand)</summary>
179
-
180
- - Results in Tab. 2 of the main paper
181
-
182
- <p align="center">
183
- <img width="900" src="assets/Quantitative.png">
184
- </p>
185
-
186
- </details>
187
-
188
- <details open>
189
- <summary>Qualitative Results (click to expand)</summary>
190
-
191
- - Results in Fig. 4 of the main paper
192
-
193
- <p align="center">
194
- <img width="900" src="assets/Qualitative-1.png">
195
- </p>
196
- <details>
197
- <summary>More Qualitative Results</summary>
198
-
199
-
200
-
201
-
202
- - More results in Fig. 3 of the supplementary material
203
-
204
- <p align="center">
205
- <img width="900" src="assets/Qualitative-2-1.png">
206
- </p>
207
-
208
-
209
-
210
- - More results in Fig. 4 of the supplementary material
211
-
212
- <p align="center">
213
- <img width="900" src="assets/Qualitative-2-2.png">
214
- </p>
215
-
216
-
217
- - More results in Fig. 5 of the supplementary material
218
-
219
- <p align="center">
220
- <img width="900" src="assets/Qualitative-3-1.png">
221
- <img width="900" src="assets/Qualitative-3-2.png">
222
- </p>
223
-
224
-
225
- - More results in Fig. 6 of the supplementary material
226
-
227
- <p align="center">
228
- <img width="900" src="assets/Qualitative-4-1.png">
229
- <img width="900" src="assets/Qualitative-4-2.png">
230
- </p>
231
-
232
-
233
- - More results in Fig. 7 of the supplementary material
234
-
235
- <p align="center">
236
- <img width="900" src="assets/Qualitative-5-1.png">
237
- <img width="900" src="assets/Qualitative-5-2.png">
238
- </p>
239
-
240
- </details>
241
-
242
- </details>
243
-
244
- ## <a name="citation"></a>📎 Citation
245
-
246
- If you find the code helpful in your research or work, please cite the following paper(s).
247
-
248
- ```
249
- @article{chen2025dove,
250
- title={DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution},
251
- author={Chen, Zheng and Zou, Zichen and Zhang, Kewei and Su, Xiongfei and Yuan, Xin and Guo, Yong and Zhang, Yulun},
252
- journal={arXiv preprint arXiv:2505.16239},
253
- year={2025}
254
- }
255
- ```
256
-
257
- ## <a name="acknowledgements"></a>💡 Acknowledgements
258
-
259
- This project is based on [CogVideo](https://github.com/THUDM/CogVideo) and [Open-Sora](https://github.com/hpcaitech/Open-Sora).
260
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Dove
3
+ emoji:
4
+ colorFrom: purple
5
+ colorTo: gray
6
+ sdk: gradio
7
+ sdk_version: 5.35.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
  ---
12
 
13
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference