Update README.md
Browse files
README.md
CHANGED
@@ -7,12 +7,11 @@ pipeline_tag: feature-extraction
|
|
7 |
|
8 |
This repository contains UniTok, a unified visual tokenizer for both image generation and understanding tasks, as presented in [UniTok: A Unified Tokenizer for Visual Generation and Understanding](https://hf.co/papers/2502.20321).
|
9 |
|
10 |
-
Project Page: https://foundationvision.github.io/UniTok/
|
11 |
-
|
12 |
Code: https://github.com/FoundationVision/UniTok
|
13 |
|
14 |
<p align="center">
|
15 |
-
<img src="https://github.com/FoundationVision/UniTok/blob/main/assets/teaser.png" width=93%>
|
16 |
<p>
|
17 |
|
18 |
UniTok encodes fine-grained details for generation and captures high-level semantics for understanding. It's compatible with autoregressive generative models (e.g., LlamaGen), multimodal understanding models (e.g., LLaVA), and unified MLLMs (e.g., Chameleon and Liquid).
|
@@ -20,7 +19,7 @@ UniTok encodes fine-grained details for generation and captures high-level seman
|
|
20 |
Built upon UniTok, we construct an MLLM capable of both multimodal generation and understanding, which sets a new state-of-the-art among unified autoregressive MLLMs. The weights of our MLLM will be released soon.
|
21 |
|
22 |
<p align="center">
|
23 |
-
<img src="https://github.com/FoundationVision/UniTok/blob/main/assets/samples.png" width=93%>
|
24 |
<p>
|
25 |
|
26 |
## Performance
|
|
|
7 |
|
8 |
This repository contains UniTok, a unified visual tokenizer for both image generation and understanding tasks, as presented in [UniTok: A Unified Tokenizer for Visual Generation and Understanding](https://hf.co/papers/2502.20321).
|
9 |
|
10 |
+
Project Page: https://foundationvision.github.io/UniTok/ <br>
|
|
|
11 |
Code: https://github.com/FoundationVision/UniTok
|
12 |
|
13 |
<p align="center">
|
14 |
+
<img src="https://github.com/FoundationVision/UniTok/blob/main/assets/teaser.png?raw=true" width=93%>
|
15 |
<p>
|
16 |
|
17 |
UniTok encodes fine-grained details for generation and captures high-level semantics for understanding. It's compatible with autoregressive generative models (e.g., LlamaGen), multimodal understanding models (e.g., LLaVA), and unified MLLMs (e.g., Chameleon and Liquid).
|
|
|
19 |
Built upon UniTok, we construct an MLLM capable of both multimodal generation and understanding, which sets a new state-of-the-art among unified autoregressive MLLMs. The weights of our MLLM will be released soon.
|
20 |
|
21 |
<p align="center">
|
22 |
+
<img src="https://github.com/FoundationVision/UniTok/blob/main/assets/samples.png?raw=true" width=93%>
|
23 |
<p>
|
24 |
|
25 |
## Performance
|