-
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only
Paper • 2306.01116 • Published • 33 -
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Paper • 2205.14135 • Published • 13 -
RoFormer: Enhanced Transformer with Rotary Position Embedding
Paper • 2104.09864 • Published • 11 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 12
Collections
Discover the best community collections!
Collections including paper arxiv:2307.09288
-
One Wide Feedforward is All You Need
Paper • 2309.01826 • Published • 32 -
Gated recurrent neural networks discover attention
Paper • 2309.01775 • Published • 8 -
FLM-101B: An Open LLM and How to Train It with $100K Budget
Paper • 2309.03852 • Published • 44 -
Large Language Models as Optimizers
Paper • 2309.03409 • Published • 76
-
Attention Is All You Need
Paper • 1706.03762 • Published • 50 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 12 -
Learning to summarize from human feedback
Paper • 2009.01325 • Published • 4 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 16
-
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 244 -
Large-Scale Automatic Audiobook Creation
Paper • 2309.03926 • Published • 54 -
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
Paper • 2309.04269 • Published • 33 -
Textbooks Are All You Need II: phi-1.5 technical report
Paper • 2309.05463 • Published • 87