3dgs-v0 / README.md

Update README.md

2ddcd5e verified about 2 months ago

4.62 kB

	---
	base_model: THUDM/CogVideoX-5b
	datasets: finetrainers/3dgs-dissolve
	library_name: diffusers
	license: other
	license_link: https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE
	instance_prompt: 3D_dissolve A small tiger character in a colorful winter outfit appears in a 3D appearance, surrounded by a dynamic burst of red sparks. The sparks swirl around the penguin, creating a dramatic effect as they gradually evaporate into a burst of red sparks, leaving behind a stark black background.
	widget:
	- text: 3D_dissolve A small tiger character in a colorful winter outfit appears in a 3D appearance, surrounded by a dynamic burst of red sparks. The sparks swirl around the penguin, creating a dramatic effect as they gradually evaporate into a burst of red sparks, leaving behind a stark black background.
	output:
	url: "./assets/output_0.mp4"
	- text: 3D_dissolve A small car, rendered in a 3D appearance, navigates through a swirling vortex of fiery particles. As it moves forward, the surrounding environment transforms into a dynamic display of red sparks that eventually evaporate into a burst of red sparks, creating a mesmerizing visual effect against the dark backdrop.
	output:
	url: "./assets/output_1.mp4"
	tags:
	- text-to-video
	- diffusers-training
	- diffusers
	- cogvideox
	- cogvideox-diffusers
	- template:sd-lora
	---

	<Gallery />

	This is a fine-tune of the [THUDM/CogVideoX-5b](https://huggingface.co/THUDM/CogVideoX-5b) model on the
	[finetrainers/3dgs-dissolve](https://huggingface.co/datasets/finetrainers/3dgs-dissolve) dataset. We also provide
	a LoRA variant of the params. Check it out [here](#lora).

	Code: https://github.com/a-r-r-o-w/finetrainers

	> [!IMPORTANT]
	> This is an experimental checkpoint and its poor generalization is well-known.

	Inference code:

	```py
	from diffusers import CogVideoXTransformer3DModel, DiffusionPipeline
	from diffusers.utils import export_to_video
	import torch

	transformer = CogVideoXTransformer3DModel.from_pretrained(
	"finetrainers/3dgs-v0", torch_dtype=torch.bfloat16
	)
	pipeline = DiffusionPipeline.from_pretrained(
	"THUDM/CogVideoX-5b", transformer=transformer, torch_dtype=torch.bfloat16
	).to("cuda")

	prompt = """
	3D_dissolve In a 3D appearance, a bookshelf filled with books is surrounded by a burst of red sparks, creating a dramatic and explosive effect against a black background.
	"""
	negative_prompt = "inconsistent motion, blurry motion, worse quality, degenerate outputs, deformed outputs"

	video = pipeline(
	prompt=prompt,
	negative_prompt=negative_prompt,
	num_frames=81,
	height=512,
	width=768,
	num_inference_steps=50
	).frames[0]
	export_to_video(video, "output.mp4", fps=25)
	```

	Training logs are available on WandB [here](https://wandb.ai/sayakpaul/finetrainers-cogvideox/runs/r39sv4do).

	## LoRA

	We extracted a 64-rank LoRA from the finetuned checkpoint
	(script [here](https://github.com/huggingface/diffusers/blob/main/scripts/extract_lora_from_model.py)).
	[This LoRA](./extracted_3dgs_lora_64.safetensors) can be used to emulate the same kind of effect:

	<details>
	<summary>Code</summary>

	```py
	from diffusers import DiffusionPipeline
	from diffusers.utils import export_to_video
	import torch

	pipeline = DiffusionPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16).to("cuda")
	pipeline.load_lora_weights("/fsx/sayak/finetrainers/cogvideox-crush/extracted_crush_smol_lora_64.safetensors", adapter_name="crush")
	pipeline.load_lora_weights("/fsx/sayak/finetrainers/cogvideox-3dgs/extracted_3dgs_lora_64.safetensors", adapter_name="3dgs")
	pipeline

	prompts = ["""
	In a 3D appearance, a small bicycle is seen surrounded by a burst of fiery sparks, creating a dramatic and intense visual effect against the dark background.
	The video showcases a dynamic explosion of fiery particles in a 3D appearance, with sparks and embers scattering across the screen against a stark black background.
	""",
	"""
	In a 3D appearance, a bookshelf filled with books is surrounded by a burst of red sparks, creating a dramatic and explosive effect against a black background.
	""",
	]
	negative_prompt = "inconsistent motion, blurry motion, worse quality, degenerate outputs, deformed outputs, bad physique"
	id_token = "3D_dissolve"

	for i, prompt in enumerate(prompts):
	video = pipeline(
	prompt=f"{id_token} {prompt}",
	negative_prompt=negative_prompt,
	num_frames=81,
	height=512,
	width=768,
	num_inference_steps=50,
	generator=torch.manual_seed(0)
	).frames[0]
	export_to_video(video, f"output_{i}.mp4", fps=25)

	```

	</details>