-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper ā¢ 2211.04325 ā¢ Published -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper ā¢ 1810.04805 ā¢ Published ā¢ 17 -
On the Opportunities and Risks of Foundation Models
Paper ā¢ 2108.07258 ā¢ Published -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper ā¢ 2204.07705 ā¢ Published ā¢ 1
Collections
Discover the best community collections!
Collections including paper arxiv:2308.12950
-
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 55 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper ā¢ 1810.04805 ā¢ Published ā¢ 17 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper ā¢ 1907.11692 ā¢ Published ā¢ 7 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper ā¢ 1910.01108 ā¢ Published ā¢ 14
-
Masked Audio Generation using a Single Non-Autoregressive Transformer
Paper ā¢ 2401.04577 ā¢ Published ā¢ 43 -
Code Llama: Open Foundation Models for Code
Paper ā¢ 2308.12950 ā¢ Published ā¢ 25 -
Simple and Controllable Music Generation
Paper ā¢ 2306.05284 ā¢ Published ā¢ 149 -
High Fidelity Neural Audio Compression
Paper ā¢ 2210.13438 ā¢ Published ā¢ 4
-
Design2Code: How Far Are We From Automating Front-End Engineering?
Paper ā¢ 2403.03163 ā¢ Published ā¢ 95 -
Wukong: Towards a Scaling Law for Large-Scale Recommendation
Paper ā¢ 2403.02545 ā¢ Published ā¢ 17 -
StarCoder: may the source be with you!
Paper ā¢ 2305.06161 ā¢ Published ā¢ 31 -
Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models
Paper ā¢ 2308.10462 ā¢ Published ā¢ 2
-
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss
Paper ā¢ 2402.10790 ā¢ Published ā¢ 42 -
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models
Paper ā¢ 2402.10524 ā¢ Published ā¢ 23 -
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution
Paper ā¢ 2410.16256 ā¢ Published ā¢ 60 -
Code Llama: Open Foundation Models for Code
Paper ā¢ 2308.12950 ā¢ Published ā¢ 25
-
StarCoder: may the source be with you!
Paper ā¢ 2305.06161 ā¢ Published ā¢ 31 -
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
Paper ā¢ 2306.08568 ā¢ Published ā¢ 28 -
SantaCoder: don't reach for the stars!
Paper ā¢ 2301.03988 ā¢ Published ā¢ 7 -
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Paper ā¢ 2401.14196 ā¢ Published ā¢ 61
-
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
Paper ā¢ 2311.12793 ā¢ Published ā¢ 18 -
PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics
Paper ā¢ 2311.12198 ā¢ Published ā¢ 22 -
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation
Paper ā¢ 2311.18775 ā¢ Published ā¢ 6 -
Code Llama: Open Foundation Models for Code
Paper ā¢ 2308.12950 ā¢ Published ā¢ 25
-
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 55 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper ā¢ 2307.08691 ā¢ Published ā¢ 8 -
Mixtral of Experts
Paper ā¢ 2401.04088 ā¢ Published ā¢ 158 -
Mistral 7B
Paper ā¢ 2310.06825 ā¢ Published ā¢ 46