-
Lost in the Middle: How Language Models Use Long Contexts
Paper ā¢ 2307.03172 ā¢ Published ā¢ 39 -
Efficient Estimation of Word Representations in Vector Space
Paper ā¢ 1301.3781 ā¢ Published ā¢ 6 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper ā¢ 1810.04805 ā¢ Published ā¢ 16 -
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 51
Collections
Discover the best community collections!
Collections including paper arxiv:1706.03762
-
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 51 -
Training Generative Adversarial Networks with Limited Data
Paper ā¢ 2006.06676 ā¢ Published -
A survey of Generative AI Applications
Paper ā¢ 2306.02781 ā¢ Published -
10.9k
Stable Diffusion 2-1
š„Generate images from text descriptions
-
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 51 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper ā¢ 1810.04805 ā¢ Published ā¢ 16 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper ā¢ 1907.11692 ā¢ Published ā¢ 7 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper ā¢ 1910.01108 ā¢ Published ā¢ 14
-
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 51 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper ā¢ 2106.09685 ā¢ Published ā¢ 32 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper ā¢ 2305.18290 ā¢ Published ā¢ 53 -
Lost in the Middle: How Language Models Use Long Contexts
Paper ā¢ 2307.03172 ā¢ Published ā¢ 39
-
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 51 -
Language Models are Few-Shot Learners
Paper ā¢ 2005.14165 ā¢ Published ā¢ 12 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper ā¢ 2201.11903 ā¢ Published ā¢ 10 -
Orca 2: Teaching Small Language Models How to Reason
Paper ā¢ 2311.11045 ā¢ Published ā¢ 72
-
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 51 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper ā¢ 1810.04805 ā¢ Published ā¢ 16 -
Universal Language Model Fine-tuning for Text Classification
Paper ā¢ 1801.06146 ā¢ Published ā¢ 6 -
Language Models are Few-Shot Learners
Paper ā¢ 2005.14165 ā¢ Published ā¢ 12
-
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 51 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper ā¢ 1810.04805 ā¢ Published ā¢ 16 -
Universal Language Model Fine-tuning for Text Classification
Paper ā¢ 1801.06146 ā¢ Published ā¢ 6 -
Language Models are Few-Shot Learners
Paper ā¢ 2005.14165 ā¢ Published ā¢ 12
-
The Impact of Depth and Width on Transformer Language Model Generalization
Paper ā¢ 2310.19956 ā¢ Published ā¢ 10 -
Retentive Network: A Successor to Transformer for Large Language Models
Paper ā¢ 2307.08621 ā¢ Published ā¢ 170 -
RWKV: Reinventing RNNs for the Transformer Era
Paper ā¢ 2305.13048 ā¢ Published ā¢ 15 -
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 51
-
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 51 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper ā¢ 2307.08691 ā¢ Published ā¢ 8 -
Mixtral of Experts
Paper ā¢ 2401.04088 ā¢ Published ā¢ 157 -
Mistral 7B
Paper ā¢ 2310.06825 ā¢ Published ā¢ 46