Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2503.07365

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 26
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 43
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 22

about 7 hours ago

Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published 8 days ago • 77
Temporal Preference Optimization for Long-Form Video Understanding

Paper • 2501.13919 • Published Jan 23 • 22
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning

Paper • 2503.07365 • Published 4 days ago • 53

MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning

Paper • 2503.07365 • Published 4 days ago • 53
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning

Paper • 2503.05379 • Published 7 days ago • 28

Multimodal Reasoning

about 21 hours ago

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Paper • 2502.11573 • Published 25 days ago • 9
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4 • 22
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Paper • 2502.11775 • Published 25 days ago • 8
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 37

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published Jan 30 • 56
Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization

Paper • 2412.18279 • Published Dec 24, 2024
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback

Paper • 2501.10799 • Published Jan 18 • 15
Reward-Guided Speculative Decoding for Efficient LLM Reasoning

Paper • 2501.19324 • Published Jan 31 • 38

Reasoning Models

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published Jan 30 • 56
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!

Paper • 2502.07374 • Published about 1 month ago • 36
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published Feb 10 • 142
S*: Test Time Scaling for Code Generation

Paper • 2502.14382 • Published 22 days ago • 60

RL+reason model

about 4 hours ago

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 25
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 26
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 108
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 4

Llms and reasoning

Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

Paper • 2501.09686 • Published Jan 16 • 37
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 346
Chain-of-Retrieval Augmented Generation

Paper • 2501.14342 • Published Jan 24 • 52
RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 25

about 16 hours ago

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published Dec 31, 2024 • 27
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 100
Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 26
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 97

Image-Video MultiModal Understanding

about 20 hours ago

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 140
SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization

Paper • 2501.01245 • Published Jan 2 • 5
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published Dec 31, 2024 • 41
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks

Paper • 2501.08326 • Published Jan 14 • 32

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs