Text to Video

Generate an video based on a given text prompt.

For more details about the text-to-video task, check out its dedicated page! You will find examples and related materials.

Recommended models

tencent/HunyuanVideo: A strong model for consistent video generation.
Lightricks/LTX-Video: A text-to-video model with high fidelity motion and strong prompt adherence.
Wan-AI/Wan2.1-T2V-1.3B: A robust model for video generation.

Explore all available models and find the one that suits you best here.

Using the API

Language

Provider

Settings

import os
from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="fal-ai",
    api_key=os.environ["HF_TOKEN"],
)

video = client.text_to_video(
    "A young man walking on the street",
    model="tencent/HunyuanVideo",
)

API specification

Request

Payload
inputs*	string	The input text data (sometimes called “prompt”)
parameters	object
num_frames	number	The num_frames parameter determines how many video frames are generated.
guidance_scale	number	A higher guidance scale value encourages the model to generate videos closely linked to the text prompt, but values too high may cause saturation and other artifacts.
negative_prompt	string[]	One or several prompt to guide what NOT to include in video generation.
num_inference_steps	integer	The number of denoising steps. More denoising steps usually lead to a higher quality video at the expense of slower inference.
seed	integer	Seed for the random number generator.

Headers
authorization	string	Authentication header in the form `'Bearer: hf_**'` when `hf_**` is a personal user access token with “Inference Providers” permission. You can generate one from your settings page.

Response

Body
video	unknown	The generated video returned as raw bytes in the payload.