ProBench: Judging Multimodal Foundation Models on Open-ended Multi-domain Expert Tasks Paper • 2503.06885 • Published 4 days ago • 3
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published 21 days ago • 97
Reward-Guided Speculative Decoding for Efficient LLM Reasoning Paper • 2501.19324 • Published Jan 31 • 38
Are Human-generated Demonstrations Necessary for In-context Learning? Paper • 2309.14681 • Published Sep 26, 2023 • 1
Towards Building the Federated GPT: Federated Instruction Tuning Paper • 2305.05644 • Published May 9, 2023 • 5
InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks Paper • 2401.05507 • Published Jan 10, 2024 • 1
Instruction Tuning for Large Language Models: A Survey Paper • 2308.10792 • Published Aug 21, 2023 • 1
Empowering Large Language Model Agents through Action Learning Paper • 2402.15809 • Published Feb 24, 2024
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism Paper • 2407.10457 • Published Jul 15, 2024 • 24
Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare Paper • 2405.19298 • Published May 29, 2024
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation Paper • 2411.13281 • Published Nov 20, 2024 • 20
RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation Paper • 2303.12570 • Published Mar 22, 2023
Private-Library-Oriented Code Generation with Large Language Models Paper • 2307.15370 • Published Jul 28, 2023 • 1