README.md · kanashi6/UFO-InternVL2-8B-rec-ft at main

metadata

license: apache-2.0
library_name: transformers
pipeline_tag: image-text-to-text

This repository contains the model presented in the paper UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface.

UFO unifies object-level detection, pixel-level segmentation, and image-level vision-language tasks into a single model by transforming all perception targets into the language space. It introduces a novel embedding retrieval approach that relies solely on the language interface to support segmentation tasks.

For more details, please refer to the original paper and the GitHub repository:

Paper: UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
GitHub: https://github.com/nnnth/UFO