Harold Chen's picture

6 36 7

Harold Chen

Harold328

·

https://haroldchen19.github.io/

HaroldChen19

AI & ML interests

Computer Vision

Recent Activity

upvoted a paper 2 days ago

GARDO: Reinforcing Diffusion Models without Reward Hacking

upvoted a paper 4 days ago

Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation

upvoted a paper 13 days ago

Spatia: Video Generation with Updatable Spatial Memory

View all activity

Organizations

None yet

upvoted a paper 2 days ago

GARDO: Reinforcing Diffusion Models without Reward Hacking

Paper • 2512.24138 • Published 10 days ago • 28

upvoted a paper 4 days ago

Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation

Paper • 2512.24271 • Published 10 days ago • 49

upvoted a paper 13 days ago

Spatia: Video Generation with Updatable Spatial Memory

Paper • 2512.15716 • Published 23 days ago • 30

upvoted a paper 18 days ago

Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing

Paper • 2512.17909 • Published 21 days ago • 36

upvoted 2 papers 21 days ago

Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation

Paper • 2512.16913 • Published 22 days ago • 33

StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors

Paper • 2512.16915 • Published 22 days ago • 37

upvoted 3 papers 22 days ago

SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning

Paper • 2512.13874 • Published 24 days ago • 16

End-to-End Training for Autoregressive Video Diffusion via Self-Resampling

Paper • 2512.15702 • Published 23 days ago • 14

DEER: Draft with Diffusion, Verify with Autoregressive Models

Paper • 2512.15176 • Published 23 days ago • 42

upvoted a paper 23 days ago

A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning

Paper • 2512.14442 • Published 24 days ago • 10

upvoted a paper 24 days ago

Memory in the Age of AI Agents

Paper • 2512.13564 • Published 25 days ago • 132

upvoted 7 papers about 1 month ago

OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation

Paper • 2512.08294 • Published Dec 9, 2025 • 17

PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling

Paper • 2512.04784 • Published Dec 2, 2025 • 24

4DLangVGGT: 4D Language-Visual Geometry Grounded Transformer

Paper • 2512.05060 • Published Dec 4, 2025 • 18

DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation

Paper • 2511.23127 • Published Nov 28, 2025 • 43

Accelerating Streaming Video Large Language Models via Hierarchical Token Compression

Paper • 2512.00891 • Published Nov 30, 2025 • 14

TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models

Paper • 2512.02014 • Published Dec 1, 2025 • 72

Video Generation Models Are Good Latent Reward Models

Paper • 2511.21541 • Published Nov 26, 2025 • 45

upvoted 2 papers about 2 months ago

Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens

Paper • 2511.19418 • Published Nov 24, 2025 • 28

TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models

Paper • 2511.13704 • Published Nov 17, 2025 • 42