new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

byAK and the research community

Sep 12

Submitted by

huangsiteng

VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model

·
16 authors

Submitted by

TianxiangMa

HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning

·
10 authors

Submitted by

Haozhan72

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

·
21 authors

Submitted by

Yoohao

EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs

·
7 authors

Submitted by

taesiri

Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis

·
14 authors

Submitted by

Jarvis1111

Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents

·
10 authors

2

Submitted by

taesiri

FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark

·
10 authors

Submitted by

LanguageBind

Can Understanding and Generation Truly Benefit Together -- or Just Coexist?

·
14 authors

Submitted by

HaoyuDong

MachineLearningLM: Continued Pretraining Language Models on Millions of Synthetic Tabular Prediction Tasks Scales In-Context ML

·
5 authors

Submitted by

taesiri

SpatialVID: A Large-Scale Video Dataset with Spatial Annotations

·
15 authors

Submitted by

amant555

AU-Harness: An Open-Source Toolkit for Holistic Evaluation of Audio LLMs

·
8 authors

Submitted by

ManTle

Visual Programmability: A Guide for Code-as-Thought in Chart Understanding

·
9 authors

Submitted by

orionweller

mmBERT: A Modern Multilingual Encoder with Annealed Language Learning

·
6 authors

2

Submitted by

moak7

Spatial Reasoning with Vision-Language Models in Ego-Centric Multi-View Scenes

·
5 authors

Submitted by

Kaichengalex

Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval

·
6 authors

Submitted by

learn12138

2D Gaussian Splatting with Semantic Alignment for Image Inpainting

·
4 authors

2

Submitted by

taesiri

LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering

·
17 authors

Submitted by

taesiri

OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-aware Reasoning

·
13 authors

Submitted by

Bryceee

Towards Better Dental AI: A Multimodal Benchmark and Instruction Dataset for Panoramic X-ray Analysis

·
10 authors

Submitted by

oravus

ObjectReact: Learning Object-Relative Control for Visual Navigation

·
8 authors

Submitted by

weipang142857

The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward

·
10 authors

Submitted by

mmock

Cross-Domain Evaluation of Transformer-Based Vulnerability Detection on Open & Industry Data

·
3 authors

2

Submitted by

renkelin

Modality Alignment with Multi-scale Bilateral Attention for Multimodal Recommendation

·
3 authors

Submitted by

Kitxuu

All You Need Is A Fuzzing Brain: An LLM-Powered System for Automated Vulnerability Detection and Patching

·
8 authors

Submitted by

iliashum

Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated

·
7 authors