|
--- |
|
base_model: THUDM/CogVideoX-5b |
|
datasets: finetrainers/3dgs-dissolve |
|
library_name: diffusers |
|
license: other |
|
license_link: https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE |
|
instance_prompt: 3D_dissolve A small tiger character in a colorful winter outfit appears in a 3D appearance, surrounded by a dynamic burst of red sparks. The sparks swirl around the penguin, creating a dramatic effect as they gradually evaporate into a burst of red sparks, leaving behind a stark black background. |
|
widget: |
|
- text: 3D_dissolve A small tiger character in a colorful winter outfit appears in a 3D appearance, surrounded by a dynamic burst of red sparks. The sparks swirl around the penguin, creating a dramatic effect as they gradually evaporate into a burst of red sparks, leaving behind a stark black background. |
|
output: |
|
url: "./assets/output_0.mp4" |
|
- text: 3D_dissolve A small car, rendered in a 3D appearance, navigates through a swirling vortex of fiery particles. As it moves forward, the surrounding environment transforms into a dynamic display of red sparks that eventually evaporate into a burst of red sparks, creating a mesmerizing visual effect against the dark backdrop. |
|
output: |
|
url: "./assets/output_1.mp4" |
|
tags: |
|
- text-to-video |
|
- diffusers-training |
|
- diffusers |
|
- cogvideox |
|
- cogvideox-diffusers |
|
- template:sd-lora |
|
--- |
|
|
|
<Gallery /> |
|
|
|
This is a fine-tune of the [THUDM/CogVideoX-5b](https://huggingface.co/THUDM/CogVideoX-5b) model on the |
|
[finetrainers/3dgs-dissolve](https://huggingface.co/datasets/finetrainers/3dgs-dissolve) dataset. We also provide |
|
a LoRA variant of the params. Check it out [here](#lora). |
|
|
|
Code: https://github.com/a-r-r-o-w/finetrainers |
|
|
|
> [!IMPORTANT] |
|
> This is an experimental checkpoint and its poor generalization is well-known. |
|
|
|
Inference code: |
|
|
|
```py |
|
from diffusers import CogVideoXTransformer3DModel, DiffusionPipeline |
|
from diffusers.utils import export_to_video |
|
import torch |
|
|
|
transformer = CogVideoXTransformer3DModel.from_pretrained( |
|
"finetrainers/3dgs-v0", torch_dtype=torch.bfloat16 |
|
) |
|
pipeline = DiffusionPipeline.from_pretrained( |
|
"THUDM/CogVideoX-5b", transformer=transformer, torch_dtype=torch.bfloat16 |
|
).to("cuda") |
|
|
|
prompt = """ |
|
3D_dissolve In a 3D appearance, a bookshelf filled with books is surrounded by a burst of red sparks, creating a dramatic and explosive effect against a black background. |
|
""" |
|
negative_prompt = "inconsistent motion, blurry motion, worse quality, degenerate outputs, deformed outputs" |
|
|
|
video = pipeline( |
|
prompt=prompt, |
|
negative_prompt=negative_prompt, |
|
num_frames=81, |
|
height=512, |
|
width=768, |
|
num_inference_steps=50 |
|
).frames[0] |
|
export_to_video(video, "output.mp4", fps=25) |
|
``` |
|
|
|
Training logs are available on WandB [here](https://wandb.ai/sayakpaul/finetrainers-cogvideox/runs/r39sv4do). |
|
|
|
## LoRA |
|
|
|
We extracted a 64-rank LoRA from the finetuned checkpoint |
|
(script [here](https://github.com/huggingface/diffusers/blob/main/scripts/extract_lora_from_model.py)). |
|
[This LoRA](./extracted_3dgs_lora_64.safetensors) can be used to emulate the same kind of effect: |
|
|
|
<details> |
|
<summary>Code</summary> |
|
|
|
```py |
|
from diffusers import DiffusionPipeline |
|
from diffusers.utils import export_to_video |
|
import torch |
|
|
|
pipeline = DiffusionPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16).to("cuda") |
|
pipeline.load_lora_weights("/fsx/sayak/finetrainers/cogvideox-crush/extracted_crush_smol_lora_64.safetensors", adapter_name="crush") |
|
pipeline.load_lora_weights("/fsx/sayak/finetrainers/cogvideox-3dgs/extracted_3dgs_lora_64.safetensors", adapter_name="3dgs") |
|
pipeline |
|
|
|
prompts = [""" |
|
In a 3D appearance, a small bicycle is seen surrounded by a burst of fiery sparks, creating a dramatic and intense visual effect against the dark background. |
|
The video showcases a dynamic explosion of fiery particles in a 3D appearance, with sparks and embers scattering across the screen against a stark black background. |
|
""", |
|
""" |
|
In a 3D appearance, a bookshelf filled with books is surrounded by a burst of red sparks, creating a dramatic and explosive effect against a black background. |
|
""", |
|
] |
|
negative_prompt = "inconsistent motion, blurry motion, worse quality, degenerate outputs, deformed outputs, bad physique" |
|
id_token = "3D_dissolve" |
|
|
|
for i, prompt in enumerate(prompts): |
|
video = pipeline( |
|
prompt=f"{id_token} {prompt}", |
|
negative_prompt=negative_prompt, |
|
num_frames=81, |
|
height=512, |
|
width=768, |
|
num_inference_steps=50, |
|
generator=torch.manual_seed(0) |
|
).frames[0] |
|
export_to_video(video, f"output_{i}.mp4", fps=25) |
|
|
|
``` |
|
|
|
</details> |