3dgs-v0 / README.md
sayakpaul's picture
sayakpaul HF staff
Update README.md
2ddcd5e verified
metadata
base_model: THUDM/CogVideoX-5b
datasets: finetrainers/3dgs-dissolve
library_name: diffusers
license: other
license_link: https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE
instance_prompt: >-
  3D_dissolve A small tiger character in a colorful winter outfit appears in a
  3D appearance, surrounded by a dynamic burst of red sparks. The sparks swirl
  around the penguin, creating a dramatic effect as they gradually evaporate
  into a burst of red sparks, leaving behind a stark black background.
widget:
  - text: >-
      3D_dissolve A small tiger character in a colorful winter outfit appears in
      a 3D appearance, surrounded by a dynamic burst of red sparks. The sparks
      swirl around the penguin, creating a dramatic effect as they gradually
      evaporate into a burst of red sparks, leaving behind a stark black
      background.
    output:
      url: ./assets/output_0.mp4
  - text: >-
      3D_dissolve A small car, rendered in a 3D appearance, navigates through a
      swirling vortex of fiery particles. As it moves forward, the surrounding
      environment transforms into a dynamic display of red sparks that
      eventually evaporate into a burst of red sparks, creating a mesmerizing
      visual effect against the dark backdrop.
    output:
      url: ./assets/output_1.mp4
tags:
  - text-to-video
  - diffusers-training
  - diffusers
  - cogvideox
  - cogvideox-diffusers
  - template:sd-lora
Prompt
3D_dissolve A small tiger character in a colorful winter outfit appears in a 3D appearance, surrounded by a dynamic burst of red sparks. The sparks swirl around the penguin, creating a dramatic effect as they gradually evaporate into a burst of red sparks, leaving behind a stark black background.
Prompt
3D_dissolve A small car, rendered in a 3D appearance, navigates through a swirling vortex of fiery particles. As it moves forward, the surrounding environment transforms into a dynamic display of red sparks that eventually evaporate into a burst of red sparks, creating a mesmerizing visual effect against the dark backdrop.

This is a fine-tune of the THUDM/CogVideoX-5b model on the finetrainers/3dgs-dissolve dataset. We also provide a LoRA variant of the params. Check it out here.

Code: https://github.com/a-r-r-o-w/finetrainers

This is an experimental checkpoint and its poor generalization is well-known.

Inference code:

from diffusers import CogVideoXTransformer3DModel, DiffusionPipeline 
from diffusers.utils import export_to_video
import torch 

transformer = CogVideoXTransformer3DModel.from_pretrained(
    "finetrainers/3dgs-v0", torch_dtype=torch.bfloat16
)
pipeline = DiffusionPipeline.from_pretrained(
    "THUDM/CogVideoX-5b", transformer=transformer, torch_dtype=torch.bfloat16
).to("cuda")

prompt = """
3D_dissolve In a 3D appearance, a bookshelf filled with books is surrounded by a burst of red sparks, creating a dramatic and explosive effect against a black background.
"""
negative_prompt = "inconsistent motion, blurry motion, worse quality, degenerate outputs, deformed outputs"

video = pipeline(
    prompt=prompt, 
    negative_prompt=negative_prompt, 
    num_frames=81, 
    height=512,
    width=768,
    num_inference_steps=50
).frames[0]
export_to_video(video, "output.mp4", fps=25)

Training logs are available on WandB here.

LoRA

We extracted a 64-rank LoRA from the finetuned checkpoint (script here). This LoRA can be used to emulate the same kind of effect:

Code
from diffusers import DiffusionPipeline 
from diffusers.utils import export_to_video
import torch 

pipeline = DiffusionPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16).to("cuda")
pipeline.load_lora_weights("/fsx/sayak/finetrainers/cogvideox-crush/extracted_crush_smol_lora_64.safetensors", adapter_name="crush")
pipeline.load_lora_weights("/fsx/sayak/finetrainers/cogvideox-3dgs/extracted_3dgs_lora_64.safetensors", adapter_name="3dgs")
pipeline

prompts = ["""
In a 3D appearance, a small bicycle is seen surrounded by a burst of fiery sparks, creating a dramatic and intense visual effect against the dark background.
The video showcases a dynamic explosion of fiery particles in a 3D appearance, with sparks and embers scattering across the screen against a stark black background.
""",
"""
In a 3D appearance, a bookshelf filled with books is surrounded by a burst of red sparks, creating a dramatic and explosive effect against a black background.
""",
]
negative_prompt = "inconsistent motion, blurry motion, worse quality, degenerate outputs, deformed outputs, bad physique"
id_token = "3D_dissolve"

for i, prompt in enumerate(prompts):
    video = pipeline(
        prompt=f"{id_token} {prompt}", 
        negative_prompt=negative_prompt, 
        num_frames=81, 
        height=512,
        width=768,
        num_inference_steps=50,
        generator=torch.manual_seed(0)
    ).frames[0]
    export_to_video(video, f"output_{i}.mp4", fps=25)