Spaces:
Runtime error
VMem: Consistent Video Scene Generation with Surfel-Indexed View Memory
Runjia Li, Philip Torr, Andrea Vedaldi, Tomas Jakab
University of Oxford
Overview
VMem
is a plug-and-play memory mechanism of image-set models for consistent scene generation.
Existing methods either rely on inpainting with explicit geometry estimation, which suffers from inaccuracies, or use limited context windows in video-based approaches, leading to poor long-term coherence. To overcome these issues, we introduce Surfel Memory of Views (VMem), which anchors past views to surface elements (surfels) they observed. This enables conditioning novel view generation on the most relevant past views rather than just the most recent ones, enhancing long-term scene consistency while reducing computational cost.
:wrench: Installation
conda create -n vmem python=3.10
conda activate vmem
pip install -r requirements.txt
:rocket: Usage
You need to properly authenticate with Hugging Face to download our model weights. Once set up, our code will handle it automatically at your first run. You can authenticate by running
# This will prompt you to enter your Hugging Face credentials.
huggingface-cli login
Once authenticated, go to our model card here and enter your information for access.
We provide a demo for you to interact with VMem
. Simply run
python app.py
:heart: Acknowledgement
This work is built on top of CUT3R, DUSt3R and Stable Virtual Camera. We thank them for their great works.
:books: Citing
If you find this repository useful, please consider giving a star :star: and citation.
@article{zhou2025stable,
title={Stable Virtual Camera: Generative View Synthesis with Diffusion Models},
author={Jensen (Jinghao) Zhou and Hang Gao and Vikram Voleti and Aaryaman Vasishta and Chun-Han Yao and Mark Boss and
Philip Torr and Christian Rupprecht and Varun Jampani
},
journal={arXiv preprint arXiv:2503.14489},
year={2025}
}