2 282 134

Yuseung "Phillip" Lee

phillipinseoul

https://phillipinseoul.github.io/

phillipinseoul

AI & ML interests

Computer Vision

Recent Activity

upvoted a paper about 17 hours ago

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

upvoted a paper about 17 hours ago

Reangle-A-Video: 4D Video Generation as Video-to-Video Translation

upvoted a paper 3 days ago

Words or Vision: Do Vision-Language Models Have Blind Faith in Text?

View all activity

Organizations

phillipinseoul's activity

upvoted 2 papers about 17 hours ago

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Paper • 2503.09573 • Published 1 day ago • 31

Reangle-A-Video: 4D Video Generation as Video-to-Video Translation

Paper • 2503.09151 • Published 1 day ago • 24

upvoted 3 papers 3 days ago

Words or Vision: Do Vision-Language Models Have Blind Faith in Text?

Paper • 2503.02199 • Published 10 days ago • 7

Should VLMs be Pre-trained with Image Data?

Paper • 2503.07603 • Published 3 days ago • 3

Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

Paper • 2503.06749 • Published 4 days ago • 20

liked a Space 12 days ago

106

ViewCrafter

🐨

Create a video from an image with camera motion

upvoted a paper 16 days ago

RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers

Paper • 2502.15894 • Published 20 days ago • 20

liked a model 19 days ago

Qwen/Qwen2.5-VL-72B-Instruct

Image-Text-to-Text • Updated 7 days ago • 293k • • 369

upvoted a paper 20 days ago

RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers

Paper • 2502.14377 • Published 22 days ago • 12

upvoted 2 papers 21 days ago

Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above

Paper • 2502.14127 • Published 22 days ago • 2

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published 22 days ago • 163

upvoted 2 papers 22 days ago

OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning

Paper • 2502.11271 • Published 25 days ago • 16

Continuous Diffusion Model for Language Modeling

Paper • 2502.11564 • Published 25 days ago • 52

liked a model 23 days ago

llava-hf/llama3-llava-next-8b-hf

Image-Text-to-Text • Updated Jan 27 • 11.7k • 36

upvoted 5 papers 25 days ago

Large Language Diffusion Models

Paper • 2502.09992 • Published 28 days ago • 103

ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models

Paper • 2502.09696 • Published 28 days ago • 39

Region-Adaptive Sampling for Diffusion Transformers

Paper • 2502.10389 • Published 27 days ago • 52

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published 29 days ago • 143

Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation

Paper • 2502.08690 • Published 29 days ago • 41

upvoted a paper 28 days ago

The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding

Paper • 2502.08946 • Published 29 days ago • 184