new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

by AK and the research community

Oct 4

Submitted by

jefflai

Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models

·
12 authors

Submitted by

akhaliq

SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration

·
5 authors

Submitted by

akhaliq

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

·
7 authors

Submitted by

ZhangYuanhan

Video Instruction Tuning With Synthetic Data

·
7 authors

Submitted by

akhaliq

Loong: Generating Minute-level Long Videos with Autoregressive Language Models

·
8 authors

Submitted by

thughost

LLaVA-Critic: Learning to Evaluate Multimodal Models

·
8 authors

Submitted by

haotiz

Contrastive Localized Language-Image Pre-Training

·
10 authors

Submitted by

ambroiseodt

Large Language Models as Markov Chains

·
6 authors

Submitted by

msadat97

Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models

·
3 authors

Submitted by

kazemnejad

VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment

·
7 authors

Submitted by

WillHeld

Distilling an End-to-End Voice Assistant Without Instruction Training Data

·
6 authors

Submitted by

jxm

Contextual Document Embeddings

·
2 authors

Submitted by

Xiaoye08

CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling

·
4 authors

Submitted by

onekq

Training Language Models on Synthetic Edit Sequences Improves Code Synthesis

·
3 authors

Submitted by

shayekh

Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models

·
6 authors

Submitted by

ZetangForward

L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding?

·
6 authors

Submitted by

yossig

Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations

·
4 authors

Submitted by

jasonyux

Improving Autonomous AI Agents with Reflective Tree Search and Self-Learning

·
7 authors

Submitted by

amanchadha

MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation

·
4 authors

Submitted by

xiaobiaodu

MVGS: Multi-view-regulated Gaussian Splatting for Novel View Synthesis

·
3 authors

Submitted by

mucai

Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos

·
3 authors

Submitted by

Ksgk-fy

Intelligence at the Edge of Chaos

·
8 authors

Submitted by

Sreyan88

Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data

·
6 authors

Submitted by

lucasbandarkar

Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models

·
7 authors

Submitted by

weitaikang

Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning

·
5 authors

Submitted by

BFauber

Learning the Latent Rules of a Game from Data: A Chess Story

·
1 authors

Submitted by

uzw

SciPrompt: Knowledge-augmented Prompting for Fine-grained Categorization of Scientific Topics

·
5 authors