new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

by AK and the research community

Jan 8

Submitted by

chuyi777

REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

·
1 authors

Submitted by

akhaliq

Cosmos World Foundation Model Platform for Physical AI

·
78 authors

Submitted by

zhangshaolei

LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

·
4 authors

Submitted by

LXT

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

·
10 authors

Submitted by

LiquidAmmonia

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

·
9 authors

Submitted by

akhaliq

Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control

·
12 authors

Submitted by

Forceless

PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides

·
9 authors

Submitted by

tnlin

OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis

·
13 authors

Submitted by

BoZhang

Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback

·
9 authors

Submitted by

julianjuaner

Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers

·
7 authors

Submitted by

yyqoni

Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model

·
8 authors

Submitted by

ozbro

MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting

·
6 authors

Submitted by

mjbuehler

Graph-Aware Isomorphic Attention for Adaptive Dynamics in Transformers

·
1 authors

Submitted by

Tvaranka

MagicFace: High-Fidelity Facial Expression Editing with Action-Unit Control

·
5 authors

Submitted by

WenhaoWang

Generalizable Origin Identification for Text-Guided Image-to-Image Diffusion Models

·
6 authors