Update README.md
Browse files
README.md
CHANGED
@@ -1,142 +1,13 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
|
15 |
-
### FloVD-CogVideoX-5B
|
16 |
-
|
17 |
-
|
18 |
-
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
19 |
-
<tr>
|
20 |
-
<td>
|
21 |
-
<video src="https://github.com/user-attachments/assets/a55d1c29-6682-417d-886c-695b1d1b61fd" width="100%" controls autoplay loop></video>
|
22 |
-
</td>
|
23 |
-
<td>
|
24 |
-
<video src="https://github.com/user-attachments/assets/4def8617-063f-4e61-969a-fd0507dbdeec" width="100%" controls autoplay loop></video>
|
25 |
-
</td>
|
26 |
-
<td>
|
27 |
-
<video src="https://github.com/user-attachments/assets/55745611-fea3-4f3f-bdd1-48b5f6c24f98" width="100%" controls autoplay loop></video>
|
28 |
-
</td>
|
29 |
-
<td>
|
30 |
-
<video src="https://github.com/user-attachments/assets/97be3121-ae38-45f9-822a-e387cf262824" width="100%" controls autoplay loop></video>
|
31 |
-
</td>
|
32 |
-
</tr>
|
33 |
-
</table>
|
34 |
-
|
35 |
-
## Project Updates
|
36 |
-
|
37 |
-
- **News**: ```2025/05/02```: We have updated the code for `FloVD-CogVideoX`. We will release dataset preprocessing and training codes soon.
|
38 |
-
|
39 |
-
- **News**: ```2025/02/26```: Our paper has been accepted to CVPR 2025.
|
40 |
-
|
41 |
-
|
42 |
-
## Quick Start
|
43 |
-
|
44 |
-
### Prompt Optimization
|
45 |
-
|
46 |
-
As mentioned in [CogVideoX](https://github.com/THUDM/CogVideo), we recommend to use long, detailed text prompts to get better results. Our FloVD-CogVideoX model is trained using text captions extracted from [CogVLM2](https://github.com/THUDM/CogVLM2).
|
47 |
-
|
48 |
-
### Environment
|
49 |
-
|
50 |
-
**Please make sure your Python version is between 3.10 and 3.12, inclusive of both 3.10 and 3.12.**
|
51 |
-
|
52 |
-
```
|
53 |
-
pip install -r requirements.txt
|
54 |
-
```
|
55 |
-
|
56 |
-
### Optical flow normalization
|
57 |
-
As mentioned in FloVD paper, we normalize optical flow following [Generative Image Dynamics](https://generative-dynamics.github.io/). For this, we use scale factors (s_x, s_y) of (60, 36) for both FVSM and OMSM.
|
58 |
-
|
59 |
-
### Pre-trained checkpoints
|
60 |
-
Download the FloVD-CogVideoX <br>
|
61 |
-
FVSM and OMSM (Curated) <br>
|
62 |
-
[\[Google Drive\]](https://drive.google.com/drive/folders/1Y7Fha8QKX6bg_0YEOxQf0M6uaPJ9SfgB?usp=sharing)
|
63 |
-
In addition, we used the off-the-shelf depth estimation model (Depth Anything V2, metric depth).
|
64 |
-
For these models, please refer links below. <br>
|
65 |
-
[\[Depth_anything_v2_metric\]](https://github.com/DepthAnything/Depth-Anything-V2/tree/main/metric_depth)
|
66 |
-
<br>
|
67 |
-
Then, place these checkpoints in ./ckpt directory
|
68 |
-
```shell
|
69 |
-
# File tree
|
70 |
-
./ckpt/
|
71 |
-
βββ FVSM
|
72 |
-
β β FloVD_FVSM_Controlnet.pt
|
73 |
-
βββ OMSM
|
74 |
-
β β selected_blocks.safetensors
|
75 |
-
β β pytorch_lora_weights.safetensors
|
76 |
-
βββ others
|
77 |
-
β β depth_anything_v2_metric_hypersim_vitb.pth
|
78 |
-
```
|
79 |
-
|
80 |
-
### Pre-defined camera trajectory
|
81 |
-
We provide several example camera trajectory for user's quick inference.
|
82 |
-
Refer to "./assets/cam_trajectory/" for visualization of each camera trajectory.
|
83 |
-
```shell
|
84 |
-
# File tree
|
85 |
-
./assets/
|
86 |
-
βββ manual_poses
|
87 |
-
β β ...
|
88 |
-
βββ re10k_poses
|
89 |
-
β β ...
|
90 |
-
βββ manual_poses_PanTiltSpin
|
91 |
-
β β ...
|
92 |
-
```
|
93 |
-
|
94 |
-
### Inference Settings
|
95 |
-
In the inference time, we recommend to use the same setting used in the training time.
|
96 |
-
+ The number of frames: 49
|
97 |
-
|
98 |
-
+ FPS: 16
|
99 |
-
|
100 |
-
+ Flow scale factor: (s_x, s_y) = (60, 36)
|
101 |
-
|
102 |
-
+ CONTROLNET_GUIDANCE_END: 0.4 for better camera controllability, 0.1 for more natural object motions. This argument means the ratio of timestep to inject ControlNet features to the pre-trained model.
|
103 |
-
|
104 |
-
|
105 |
-
### Inference
|
106 |
-
|
107 |
-
+ [flovd_demo](inference/flovd_demo.py): To synthesize videos with desired camera trajectory and natural object motions, use this. A more detailed inference code explanation, including the significance of common parameters. Refer to [flovd_demo_script](inference/inference_scripts/flovd_demo.sh)
|
108 |
-
|
109 |
-
+ [flovd_fvsm_demo](inference/flovd_fvsm_demo.py): You can solely use FVSM model for more accurate camera control with little object motions. This code omits OMSM and only uses FVSM. (The script will be released soon.)
|
110 |
-
|
111 |
-
+ [flovd_ddp_demo](inference/flovd_ddp_demo.py): If you want to sample large number of videos, you can use this. Note that you need to prepare dataset in advance following our dataset preprocessing pipeline. (The preprocessing pipeline will be released.)
|
112 |
-
|
113 |
-
### Tools
|
114 |
-
|
115 |
-
This folder contains some tools for camera trajectory generation, visualization, etc.
|
116 |
-
|
117 |
-
+ [generate_camparam](tools/generate_camparam.py): Generate manual camera parameters such as zoom-in, zoom-out, etc.
|
118 |
-
|
119 |
-
+ [visualize trajectory](tools/visualize_trajectory.py): Converts SAT model weights to Huggingface model weights.
|
120 |
-
|
121 |
-
|
122 |
-
|
123 |
-
## Citation
|
124 |
-
|
125 |
-
π If you find our work helpful, please leave us a star and cite our paper.
|
126 |
-
|
127 |
-
```
|
128 |
-
@article{jin2025flovd,
|
129 |
-
title={FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis},
|
130 |
-
author={Jin, Wonjoon and Dai, Qi and Luo, Chong and Baek, Seung-Hwan and Cho, Sunghyun},
|
131 |
-
journal={arXiv preprint arXiv:2502.08244},
|
132 |
-
year={2025}
|
133 |
-
}
|
134 |
-
```
|
135 |
-
|
136 |
-
## Reference
|
137 |
-
We thank [CogVideoX](https://github.com/THUDM/CogVideo) for open source
|
138 |
-
|
139 |
-
## Model-License
|
140 |
-
|
141 |
-
The CogVideoX-5B model (Transformers module, include I2V and T2V) is released under
|
142 |
-
the [CogVideoX LICENSE](https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE).
|
|
|
1 |
+
---
|
2 |
+
title: FloVD
|
3 |
+
emoji: π
|
4 |
+
colorFrom: red
|
5 |
+
colorTo: blue
|
6 |
+
sdk: gradio
|
7 |
+
sdk_version: 5.35.0
|
8 |
+
app_file: app.py
|
9 |
+
pinned: false
|
10 |
+
license: mit
|
11 |
+
---
|
12 |
+
|
13 |
+
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|