-
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
Paper • 2310.03714 • Published • 34 -
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
Paper • 2312.10003 • Published • 40 -
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework
Paper • 2308.08155 • Published • 7 -
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 192
Collections
Discover the best community collections!
Collections including paper arxiv:2311.12983
-
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper • 2502.14499 • Published • 179 -
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
Paper • 2502.14739 • Published • 97 -
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?
Paper • 2502.14502 • Published • 85 -
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
Paper • 2502.14282 • Published • 20
-
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 192 -
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Paper • 2311.16502 • Published • 35 -
BLINK: Multimodal Large Language Models Can See but Not Perceive
Paper • 2404.12390 • Published • 26 -
RULER: What's the Real Context Size of Your Long-Context Language Models?
Paper • 2404.06654 • Published • 35
-
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 9 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 105 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 115
-
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 192 -
Zephyr: Direct Distillation of LM Alignment
Paper • 2310.16944 • Published • 123 -
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper • 2502.02737 • Published • 203 -
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
Paper • 2412.03304 • Published • 18
-
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Paper • 2501.04519 • Published • 263 -
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though
Paper • 2501.04682 • Published • 91 -
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning
Paper • 2410.02884 • Published • 54 -
Think Before You Speak: Cultivating Communication Skills of Large Language Models via Inner Monologue
Paper • 2311.07445 • Published