Large Language Models Orchestrating Structured Reasoning Achieve Kaggle
Grandmaster Level
Paper
•
2411.03562
•
Published
•
69
Training Language Models for Social Deduction with Multi-Agent
Reinforcement Learning
Paper
•
2502.06060
•
Published
•
38
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper
•
2502.14499
•
Published
•
193
SurveyX: Academic Survey Automation via Large Language Models
Paper
•
2502.14776
•
Published
•
100
Why Do Multi-Agent LLM Systems Fail?
Paper
•
2503.13657
•
Published
•
47
Scaling Test-time Compute for LLM Agents
Paper
•
2506.12928
•
Published
•
63
AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs
Paper
•
2507.08616
•
Published
•
14
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning
Systems in LLMs
Paper
•
2507.09477
•
Published
•
86
Agentic Reinforced Policy Optimization
Paper
•
2507.19849
•
Published
•
158
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper
•
2508.03680
•
Published
•
122
Efficient Agents: Building Effective Agents While Reducing Cost
Paper
•
2508.02694
•
Published
•
86
WideSearch: Benchmarking Agentic Broad Info-Seeking
Paper
•
2508.07999
•
Published
•
110
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Paper
•
2508.05748
•
Published
•
141
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent
Distillation and Agentic RL
Paper
•
2508.13167
•
Published
•
129
Provable Benefits of In-Tool Learning for Large Language Models
Paper
•
2508.20755
•
Published
•
11
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper
•
2509.02547
•
Published
•
228
GEM: A Gym for Agentic LLMs
Paper
•
2510.01051
•
Published
•
89
Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents
Paper
•
2509.26354
•
Published
•
17
In-the-Flow Agentic System Optimization for Effective Planning and Tool
Use
Paper
•
2510.05592
•
Published
•
106
Multi-Agent Tool-Integrated Policy Optimization
Paper
•
2510.04678
•
Published
•
30
Don't Just Fine-tune the Agent, Tune the Environment
Paper
•
2510.10197
•
Published
•
28
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents
Paper
•
2510.09577
•
Published
•
7
Agentic Entropy-Balanced Policy Optimization
Paper
•
2510.14545
•
Published
•
104
Search Self-play: Pushing the Frontier of Agent Capability without
Supervision
Paper
•
2510.18821
•
Published
•
17
AgentFold: Long-Horizon Web Agents with Proactive Context Management
Paper
•
2510.24699
•
Published
•
69
MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism
Paper
•
2511.11373
•
Published
•
12
Latent Collaboration in Multi-Agent Systems
Paper
•
2511.20639
•
Published
•
117
Agentic Learner with Grow-and-Refine Multimodal Semantic Memory
Paper
•
2511.21678
•
Published
•
12
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle
Paper
•
2512.04324
•
Published
•
149
DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems
Paper
•
2512.06749
•
Published
•
26
Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs
Paper
•
2512.17008
•
Published
•
10
Nested Browser-Use Learning for Agentic Information Seeking
Paper
•
2512.23647
•
Published
•
17