-
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper • 2309.11495 • Published • 37 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 77 -
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 85 -
Language Modeling Is Compression
Paper • 2309.10668 • Published • 83
Collections
Discover the best community collections!
Collections including paper arxiv:2402.07871
-
Mixtral of Experts
Paper • 2401.04088 • Published • 157 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 51 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 70 -
EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models
Paper • 2308.14352 • Published
-
BlackMamba: Mixture of Experts for State-Space Models
Paper • 2402.01771 • Published • 24 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 27 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 51 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 48
-
Scaling Laws for Downstream Task Performance of Large Language Models
Paper • 2402.04177 • Published • 18 -
A Tale of Tails: Model Collapse as a Change of Scaling Laws
Paper • 2402.07043 • Published • 14 -
Scaling Laws for Fine-Grained Mixture of Experts
Paper • 2402.07871 • Published • 12 -
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Paper • 2402.17193 • Published • 24
-
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 19 -
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Paper • 2402.10644 • Published • 80 -
Repeat After Me: Transformers are Better than State Space Models at Copying
Paper • 2402.01032 • Published • 24 -
Zoology: Measuring and Improving Recall in Efficient Language Models
Paper • 2312.04927 • Published • 2
-
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 53 -
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 19 -
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Paper • 2402.15220 • Published • 19 -
Linear Transformers are Versatile In-Context Learners
Paper • 2402.14180 • Published • 6
-
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 48 -
Mixtral of Experts
Paper • 2401.04088 • Published • 157 -
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
Paper • 2401.02994 • Published • 49 -
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper • 2401.02412 • Published • 37
-
YAYI 2: Multilingual Open-Source Large Language Models
Paper • 2312.14862 • Published • 14 -
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 57 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 48