---
license: apache-2.0
base_model:
- Chat-UniVi/Chat-UniVi
pipeline_tag: image-segmentation
---
The Devil is in Temporal Token: High Quality Video Reasoning Segmentation
[Sitong Gong](https://github.com/SitongGong) 1
[Yunzhi Zhuge](https://scholar.google.com.hk/citations?hl=zh-CN&user=-37EfvgAAAAJ) 1
[Lu Zhang](https://scholar.google.com.hk/citations?hl=zh-CN&user=bUtRE5UAAAAJ) 1
[Zongxin Yang](https://scholar.google.com.hk/citations?user=8IE0CfwAAAAJ&hl=zh-CN&oi=ao) 2
[Pingping Zhang](https://scholar.google.com.hk/citations?hl=zh-CN&user=MfbIbuEAAAAJ) 1
[Huchuan Lu](https://scholar.google.com.hk/citations?user=D3nE0agAAAAJ&hl=zh-CN) 1
CVPR 2025
1 Dalian University of Technology 2 Havard University
[](https://arxiv.org/pdf/2501.08549)
You can find the code at: https://github.com/SitongGong/VRS-HQ