--- license: apache-2.0 base_model: - Chat-UniVi/Chat-UniVi pipeline_tag: image-segmentation ---

The Devil is in Temporal Token: High Quality Video Reasoning Segmentation

[Sitong Gong](https://github.com/SitongGong) 1  [Yunzhi Zhuge](https://scholar.google.com.hk/citations?hl=zh-CN&user=-37EfvgAAAAJ) 1  [Lu Zhang](https://scholar.google.com.hk/citations?hl=zh-CN&user=bUtRE5UAAAAJ) 1  [Zongxin Yang](https://scholar.google.com.hk/citations?user=8IE0CfwAAAAJ&hl=zh-CN&oi=ao) 2  [Pingping Zhang](https://scholar.google.com.hk/citations?hl=zh-CN&user=MfbIbuEAAAAJ) 1  [Huchuan Lu](https://scholar.google.com.hk/citations?user=D3nE0agAAAAJ&hl=zh-CN) 1  CVPR 2025 1 Dalian University of Technology   2 Havard University  [![arXiv](https://img.shields.io/badge/arXiv-<2501.08549>-.svg)](https://arxiv.org/pdf/2501.08549) You can find the code at: https://github.com/SitongGong/VRS-HQ