-
StdGEN: Semantic-Decomposed 3D Character Generation from Single Images
Paper • 2411.05738 • Published • 14 -
A Pointer Network-based Approach for Joint Extraction and Detection of Multi-Label Multi-Class Intents
Paper • 2410.22476 • Published • 25 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 47 -
Training-free Regional Prompting for Diffusion Transformers
Paper • 2411.02395 • Published • 25
Collections
Discover the best community collections!
Collections including paper arxiv:2411.04999
-
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
Paper • 2410.02740 • Published • 52 -
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging
Paper • 2410.01215 • Published • 30 -
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Paper • 2409.17146 • Published • 106 -
EuroLLM: Multilingual Language Models for Europe
Paper • 2409.16235 • Published • 26
-
RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning
Paper • 2409.14674 • Published • 42 -
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset
Paper • 2410.22325 • Published • 10 -
Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning
Paper • 2410.21845 • Published • 13 -
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
Paper • 2411.02359 • Published • 12
-
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots
Paper • 2406.02523 • Published • 11 -
UniT: Unified Tactile Representation for Robot Learning
Paper • 2408.06481 • Published • 10 -
Latent Action Pretraining from Videos
Paper • 2410.11758 • Published • 2 -
Neural Fields in Robotics: A Survey
Paper • 2410.20220 • Published • 4
-
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
Paper • 2412.14161 • Published • 51 -
Training Software Engineering Agents and Verifiers with SWE-Gym
Paper • 2412.21139 • Published • 21 -
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Paper • 2412.19723 • Published • 82 -
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation
Paper • 2408.00764 • Published • 1
-
BLINK: Multimodal Large Language Models Can See but Not Perceive
Paper • 2404.12390 • Published • 26 -
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Paper • 2404.12803 • Published • 30 -
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Paper • 2404.13013 • Published • 31 -
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Paper • 2404.06512 • Published • 30