22 82 32

HAODONG DUAN

KennyUTC

https://kennymckormick.github.io

AI & ML interests

Video Understanding; Multi-Modal Learning

Recent Activity

upvoted a paper 11 days ago

Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing

upvoted a paper 24 days ago

ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

upvoted a paper about 2 months ago

When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought

View all activity

Organizations

reacted to mervenoyan's post with 🔥 about 1 year ago

Post

2541

we have a leaderboard for video LLMs, and most of the top models are open ones! opencompass/openvlm_video_leaderboard 👑👏
we are so back 🔥

reacted to their post with ❤️ over 1 year ago

Post

1605

OPEN VLM LEADERBOARD JUST RELEASED the FULL EVALUATION RESULTS of GPT-4o

[TL;DR]
GPT-4o shows steady progress compared to GPT-4v (0419), with a 3% improvement on the average score (68.7% -> 72.1%). GPT-4o displays stronger perception and less hallucination.

opencompass/open_vlm_leaderboard

1 reply

posted an update over 1 year ago

Post

1605

1 reply

posted an update over 1 year ago

Post

2662

Open VLM Leaderboard just updated the performance of GPT-4v (20240409), the new proprietary model ranked 1st across 50+ VLMs. Compared to the pervious version (20231106), the improvements on multimodal perception and reasoning are both huge.

Check the results:
opencompass/open_vlm_leaderboard

HAODONG DUAN

AI & ML interests

Recent Activity

Organizations

KennyUTC's activity