deepseek-ai/DeepSeek-R1-Distill-Qwen-14B Text Generation • Updated 18 days ago • 679k • • 466
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B Text Generation • Updated 18 days ago • 1.59M • • 1.26k
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation Paper • 2412.11919 • Published Dec 16, 2024 • 34
JuStRank: Benchmarking LLM Judges for System Ranking Paper • 2412.09569 • Published Dec 12, 2024 • 20
YuLan-Mini: An Open Data-efficient Language Model Paper • 2412.17743 • Published Dec 23, 2024 • 65
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization Paper • 2412.17739 • Published Dec 23, 2024 • 41
DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought Paper • 2412.17498 • Published Dec 23, 2024 • 22
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response Paper • 2412.14922 • Published Dec 19, 2024 • 86
How to Synthesize Text Data without Model Collapse? Paper • 2412.14689 • Published Dec 19, 2024 • 51
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published Dec 18, 2024 • 51
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published Dec 18, 2024 • 135
Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published Dec 9, 2024 • 78
Evaluating Language Models as Synthetic Data Generators Paper • 2412.03679 • Published Dec 4, 2024 • 48
VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper • 2412.04467 • Published Dec 5, 2024 • 107