|
|
--- |
|
|
title: Video Model Studio |
|
|
emoji: 🎥 |
|
|
colorFrom: gray |
|
|
colorTo: gray |
|
|
sdk: gradio |
|
|
sdk_version: 5.15.0 |
|
|
app_file: app.py |
|
|
pinned: true |
|
|
license: apache-2.0 |
|
|
short_description: All-in-one tool for AI video training |
|
|
--- |
|
|
|
|
|
# 🎥 Video Model Studio (VMS) |
|
|
|
|
|
## Presentation |
|
|
|
|
|
### What is this project? |
|
|
|
|
|
VMS is a Gradio app that wraps around Finetrainers, to provide a simple UI to train AI video models on Hugging Face. |
|
|
|
|
|
You can deploy it to a private space, and start long-running training jobs in the background. |
|
|
|
|
|
### One-user-per-space design |
|
|
|
|
|
Currently CMS can only support one training job at a time, anybody with access to your Gradio app will be able to upload or delete everything etc. |
|
|
|
|
|
This means you have to run VMS in a *PRIVATE* HF Space, or locally if you require full privacy. |
|
|
|
|
|
### Similar projects |
|
|
|
|
|
I wasn't aware of its existence when I started my project, but there is also this open-source initiative: https://github.com/alisson-anjos/diffusion-pipe-ui |
|
|
|
|
|
## Features |
|
|
|
|
|
### Run Finetrainers in the background |
|
|
|
|
|
The main feature of VMS is the ability to run a Finetrainers training session in the background. |
|
|
|
|
|
You can start your job, close the web browser tab, and come back the next morning to see the result. |
|
|
|
|
|
### Automatic scene splitting |
|
|
|
|
|
VMS uses PySceneDetect to split scenes. |
|
|
|
|
|
### Automatic clip captioning |
|
|
|
|
|
VMS uses `LLaVA-Video-7B-Qwen2` for captioning. You can customize the system prompt if you want to. |
|
|
|
|
|
### Download your dataset |
|
|
|
|
|
Not interested in using VMS for training? That's perfectly fine! |
|
|
|
|
|
You can use VMS for video splitting and captioning, and export the data for training on another platform eg. on Replicate or Fal. |
|
|
|
|
|
## Supported models |
|
|
|
|
|
VMS uses `Finetrainers` under the hood. In theory any model supported by Finetrainers should work in VMS. |
|
|
|
|
|
In practice, a PR (pull request) will be necessary to adapt the UI a bit to accomodate for each model specificities. |
|
|
|
|
|
### LTX-Video |
|
|
|
|
|
I have tested training a LoRA model using videos, on a single A100 instance. |
|
|
|
|
|
### HunyuanVideo |
|
|
|
|
|
I haven't tested it yet, but in theory it should work out of the box. |
|
|
Please keep in mind that this requires a lot of processing mower. |
|
|
|
|
|
### CogVideoX |
|
|
|
|
|
Do you want support for this one? Let me know in the comments! |
|
|
|
|
|
## Deployment |
|
|
|
|
|
VMS is built on top of Finetrainers and Gradio, and designed to run as a Hugging Face Space (but you can deploy it anywhere that has a NVIDIA GPU and supports Docker). |
|
|
|
|
|
### Full installation at Hugging Face |
|
|
|
|
|
Easy peasy: create a Space (make sure to use the `Gradio` type/template), and push the repo. No Docker needed! |
|
|
|
|
|
That said, please see the "RUN" section for info about environement variables. |
|
|
|
|
|
### Dev mode on Hugging Face |
|
|
|
|
|
Enable dev mode in the space, then open VSCode in local or remote and run: |
|
|
|
|
|
``` |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
As this is not automatic, then click on "Restart" in the space dev mode UI widget. |
|
|
|
|
|
### Full installation somewhere else |
|
|
|
|
|
I haven't tested it, but you can try to provided Dockerfile |
|
|
|
|
|
### Full installation in local |
|
|
|
|
|
the full installation requires: |
|
|
- Linux |
|
|
- CUDA 12 |
|
|
- Python 3.10 |
|
|
|
|
|
This is because of flash attention, which is defined in the `requirements.txt` using an URL to download a prebuilt wheel (python bindings for a native library) |
|
|
|
|
|
```bash |
|
|
./setup.sh |
|
|
``` |
|
|
|
|
|
### Degraded installation in local |
|
|
|
|
|
If you cannot meet the requirements, you can: |
|
|
|
|
|
- solution 1: fix requirements.txt to use another prebuilt wheel |
|
|
- solution 2: manually build/install flash attention |
|
|
- solution 3: don't use clip captioning |
|
|
|
|
|
Here is how to do solution 3: |
|
|
```bash |
|
|
./setup_no_captions.sh |
|
|
``` |
|
|
|
|
|
## Run |
|
|
|
|
|
### Running the Gradio app |
|
|
|
|
|
Note: please make sure you properly define the environment variables for `STORAGE_PATH` (eg. `/data/`) and `HF_HOME` (eg. `/data/huggingface/`) |
|
|
|
|
|
```bash |
|
|
python app.py |
|
|
``` |
|
|
|
|
|
### Running locally |
|
|
|
|
|
See above remarks about the environment variable. |
|
|
|
|
|
By default `run.sh` will store stuff in `.data/` (located inside the current working directory): |
|
|
|
|
|
```bash |
|
|
./run.sh |
|
|
``` |