Collections

-

Improve Mathematical Reasoning in Language Models by Automated Process Supervision

Paper • 2406.06592 • Published Jun 5, 2024 • 29

-

Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

Paper • 2406.06469 • Published Jun 10, 2024 • 28
Mixture-of-Agents Enhances Large Language Model Capabilities

Paper • 2406.04692 • Published Jun 7, 2024 • 58
CRAG -- Comprehensive RAG Benchmark

Paper • 2406.04744 • Published Jun 7, 2024 • 47
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6, 2024 • 74

Improve Mathematical Reasoning in Language Models by Automated Process Supervision

Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

Mixture-of-Agents Enhances Large Language Model Capabilities

CRAG -- Comprehensive RAG Benchmark

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Improve Mathematical Reasoning in Language Models by Automated Process Supervision

Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

Iterative Reasoning Preference Optimization

Better & Faster Large Language Models via Multi-token Prediction

ORPO: Monolithic Preference Optimization without Reference Model

KAN: Kolmogorov-Arnold Networks

Advancing LLM Reasoning Generalists with Preference Trees

ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Premise Order Matters in Reasoning with Large Language Models

Lumiere: A Space-Time Diffusion Model for Video Generation

Long-form factuality in large language models

ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion

TC4D: Trajectory-Conditioned Text-to-4D Generation

Contrastive Decoding Improves Reasoning in Large Language Models

Chain-of-Thought Reasoning Without Prompting

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Chain of Thought Empowers Transformers to Solve Inherently Serial Problems

MathScale: Scaling Instruction Tuning for Mathematical Reasoning

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Improving Small Language Models' Mathematical Reasoning via Mix Thoughts Distillation

Common 7B Language Models Already Possess Strong Math Capabilities

LoRA+: Efficient Low Rank Adaptation of Large Models

The FinBen: An Holistic Financial Benchmark for Large Language Models

TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization

TrustLLM: Trustworthiness in Large Language Models

Chain of Code: Reasoning with a Language Model-Augmented Code Emulator

Boosting LLM Reasoning: Push the Limits of Few-shot Learning with Reinforced In-Context Pruning

Learning From Mistakes Makes LLM Better Reasoner

Making Large Language Models Better Reasoners with Step-Aware Verifier