-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 16 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 9 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 10 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13
Collections
Discover the best community collections!
Collections including paper arxiv:2501.12375
-
depth-anything/Video-Depth-Anything-Large
Depth Estimation • Updated • 3 -
depth-anything/Video-Depth-Anything-Small
Depth Estimation • Updated • 1 -
113
Video Depth Anything
👀Generate depth video from input video
-
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
Paper • 2501.12375 • Published • 22
-
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Paper • 2501.04001 • Published • 42 -
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Paper • 2501.03895 • Published • 48 -
An Empirical Study of Autoregressive Pre-training from Videos
Paper • 2501.05453 • Published • 37 -
MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training
Paper • 2501.07556 • Published • 5