Path to Multimodal Generalist

community

https://generalist.top/

path2generalist

Activity Feed

AI & ML interests

Multimodal Generalist

Recent Activity

scofield7419 updated a Space 17 days ago

General-Level/README

scofield7419 published a dataset 17 days ago

General-Level/General-Bench

scofield7419 published a Space 18 days ago

General-Level/README

View all activity

General-Level's activity

scofield7419

updated a Space 17 days ago

README

🌍

scofield7419

published a dataset 17 days ago

General-Level/General-Bench

Updated 17 days ago • 35

scofield7419

published a Space 18 days ago

README

🌍

LXT

authored a paper 2 months ago

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Paper • 2501.04001 • Published Jan 7 • 43

ChocoWu

authored a paper 2 months ago

Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

Paper • 2412.19806 • Published Oct 8, 2024 • 1

scofield7419

authored a paper 2 months ago

Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

Paper • 2412.19806 • Published Oct 8, 2024 • 1

LXT

authored 3 papers 3 months ago

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Paper • 2412.07589 • Published Dec 10, 2024 • 47

EMOv2: Pushing 5M Vision Model Frontier

Paper • 2412.06674 • Published Dec 9, 2024 • 13

HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing

Paper • 2412.04280 • Published Dec 5, 2024 • 14

scofield7419

authored 11 papers 4 months ago

Transfer Visual Prompt Generator across LLMs

Paper • 2305.01278 • Published May 2, 2023

Reasoning Implicit Sentiment with Chain-of-Thought Prompting

Paper • 2305.11255 • Published May 18, 2023 • 1

MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter

Paper • 2310.12798 • Published Oct 19, 2023 • 4

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning

Paper • 2311.18651 • Published Nov 30, 2023

OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published Jun 27, 2024 • 52

PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis

Paper • 2408.09481 • Published Aug 18, 2024 • 1

Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning

Paper • 2402.11435 • Published Feb 18, 2024

What Factors Affect Multi-Modal In-Context Learning? An In-Depth Exploration

Paper • 2410.20482 • Published Oct 27, 2024 • 1

AI & ML interests

Recent Activity

Team members 4

General-Level's activity

README

README