How to only use the text and visual embedding?

by Labmem009 - opened about 1 month ago

about 1 month ago

Interesting work! I want to use the alignment between images and text in the encoder of this model for downstream tasks. How should I use it?

27 days ago

+1， is it possible to use only visual encoder to do downstream task? like classification

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment