Submitted by JUNJIE99 55 MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval · 9 authors 2
Submitted by bys0318 33 LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks · 12 authors 5
Submitted by QHL067 26 Flowing from Words to Pixels: A Framework for Cross-Modality Evolution · 5 authors 4
Submitted by EthanTaylor 15 Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion · 6 authors 2
Submitted by ychenNLP 13 AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling · 5 authors 2
Submitted by thuzhaowang 9 DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation · 5 authors 2
Submitted by syp115 6 Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception · 9 authors 2
Submitted by enisimsar 5 UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency · 5 authors 3
Submitted by aliaksandr-siarohin 5 AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation · 8 authors 2
Submitted by phenixace 4 TOMG-Bench: Evaluating LLMs on Text-based Open Molecule Generation · 5 authors 2
Submitted by LiyaoJiang 3 PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation · 7 authors 4
Submitted by gagan3012 2 DateLogicQA: Benchmarking Temporal Biases in Large Language Models · 4 authors 2