new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

byAK and the research community

Oct 10

Submitted by

teowu

Aria: An Open Multimodal Native Mixture-of-Experts Model

·
10 authors

7

Submitted by

EilamSha

GLEE: A Unified Framework and Benchmark for Language-based Economic Environments

·
6 authors

2

Submitted by

renjiepi

Personalized Visual Instruction Tuning

·
6 authors

2

Submitted by

taesiri

Pixtral 12B

·
37 authors

Submitted by

FanqingM

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

·
10 authors

3

Submitted by

akhaliq

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

·
8 authors

6

Submitted by

comin

IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation

·
9 authors

2

Submitted by

feifeiobama

Pyramidal Flow Matching for Efficient Video Generative Modeling

·
11 authors

2

Submitted by

myownskyW7

Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate

·
9 authors

2

Submitted by

ZedongWangAI

Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning

·
9 authors

3

Submitted by

alielfilali01

Falcon Mamba: The First Competitive Attention-free 7B Language Model

·
7 authors

Submitted by

haotiz

MM-Ego: Towards Building Egocentric Multimodal LLMs

·
12 authors

3

Submitted by

xk-huang

Story-Adapter: A Training-free Iterative Framework for Long Story Visualization

·
7 authors

2

Submitted by

Windy

Self-Boosting Large Language Models with Synthetic Preference Data

·
5 authors

Submitted by

paischer101

One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation

·
6 authors

2

Submitted by

akhaliq

T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design

·
7 authors

Submitted by

akhaliq

CursorCore: Assist Programming through Aligning Anything

·
5 authors

Submitted by

tobiaslee

Temporal Reasoning Transfer from Text to Video

·
9 authors

4

Submitted by

akhaliq

ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler

·
3 authors

Submitted by

akhaliq

TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation

·
2 authors

Submitted by

seokhyun

Response Tuning: Aligning Large Language Models without Instruction

·
2 authors

2

Submitted by

akhaliq

AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs

·
10 authors

3

Submitted by

zbhpku

Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis

·
12 authors

3

Submitted by

xw-eric

Multimodal Situational Safety

·
6 authors

2

Submitted by

myownskyW7

BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way

·
9 authors

2

Submitted by

alexrame

Diversity-Rewarded CFG Distillation

·
8 authors

2

Submitted by

Yongxin-Guo

TRACE: Temporal Grounding Video LLM via Causal Event Modeling

·
6 authors

3

Submitted by

t1101675

Data Selection via Optimal Control for Language Models

·
7 authors

2

Submitted by

Rosiness

ING-VP: MLLMs cannot Play Easy Vision-based Games Yet

·
7 authors

2

Submitted by

thomas-ferraz

LLM Self-Correction with DeCRIM: Decompose, Critique, and Refine for Enhanced Following of Instructions with Multiple Constraints

·
10 authors

2

Submitted by

Minjong

Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning

·
4 authors

2

Submitted by

liuganghuggingface

Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning

·
5 authors

Submitted by

jihyoung

Mixed-Session Conversation with Egocentric Memory

·
3 authors

2

Submitted by

minwook

Collective Critics for Creative Story Generation

·
2 authors

2

Submitted by

akhaliq

TextToon: Real-Time Text Toonify Head Avatar from Single Video

·
5 authors

3

Submitted by

paischer101

Retrieval-Augmented Decision Transformer: External Memory for In-context RL

·
6 authors

2

Submitted by

dnoever

Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders

·
2 authors

2

Submitted by

akhaliq

FürElise: Capturing and Physically Synthesizing Hand Motions of Piano Performance

·
5 authors

Submitted by

tnlin

MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

·
12 authors

Submitted by

CiaraRowles

Jointly Generating Multi-view Consistent PBR Textures using Collaborative Control

·
6 authors

2

Submitted by

XUANMINGZHANG

Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach

·
4 authors

3

Submitted by

jindongwang

MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders

·
7 authors

2

Submitted by

ggcristian

TinyEmo: Scaling down Emotional Reasoning via Metric Projection

·
1 authors

2

Submitted by

zhoutianyi

Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA

·
4 authors

2

Submitted by

wenhu

VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks

·
6 authors

2

Submitted by

PahaII

VHELM: A Holistic Evaluation of Vision Language Models

·
11 authors

2

Submitted by

kargaranamir

MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment

·
6 authors

2

Submitted by

chen-yingfa

Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling

·
6 authors

3

Submitted by

vkoltun

Does Spatial Cognition Emerge in Frontier Models?

·
4 authors

2