Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2308.12950

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

about 20 hours ago

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 17
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 1

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 55
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 17
RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 7
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 14

Code Llama: Open Foundation Models for Code

Paper • 2308.12950 • Published Aug 24, 2023 • 25

Masked Audio Generation using a Single Non-Autoregressive Transformer

Paper • 2401.04577 • Published Jan 9, 2024 • 43
Code Llama: Open Foundation Models for Code

Paper • 2308.12950 • Published Aug 24, 2023 • 25
Simple and Controllable Music Generation

Paper • 2306.05284 • Published Jun 8, 2023 • 149
High Fidelity Neural Audio Compression

Paper • 2210.13438 • Published Oct 24, 2022 • 4

synthetic code generation

Design2Code: How Far Are We From Automating Front-End Engineering?

Paper • 2403.03163 • Published Mar 5, 2024 • 95
Wukong: Towards a Scaling Law for Large-Scale Recommendation

Paper • 2403.02545 • Published Mar 4, 2024 • 17
StarCoder: may the source be with you!

Paper • 2305.06161 • Published May 9, 2023 • 31
Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models

Paper • 2308.10462 • Published Aug 21, 2023 • 2

In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss

Paper • 2402.10790 • Published Feb 16, 2024 • 42
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models

Paper • 2402.10524 • Published Feb 16, 2024 • 23
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution

Paper • 2410.16256 • Published Oct 21, 2024 • 60
Code Llama: Open Foundation Models for Code

Paper • 2308.12950 • Published Aug 24, 2023 • 25

There's usually interesting papers in the model cards on the leaderboard: https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard

StarCoder: may the source be with you!

Paper • 2305.06161 • Published May 9, 2023 • 31
WizardCoder: Empowering Code Large Language Models with Evol-Instruct

Paper • 2306.08568 • Published Jun 14, 2023 • 28
SantaCoder: don't reach for the stars!

Paper • 2301.03988 • Published Jan 9, 2023 • 7
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

Paper • 2401.14196 • Published Jan 25, 2024 • 61

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

Paper • 2311.12793 • Published Nov 21, 2023 • 18
PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics

Paper • 2311.12198 • Published Nov 20, 2023 • 22
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation

Paper • 2311.18775 • Published Nov 30, 2023 • 6
Code Llama: Open Foundation Models for Code

Paper • 2308.12950 • Published Aug 24, 2023 • 25

Code Llama: Open Foundation Models for Code

Paper • 2308.12950 • Published Aug 24, 2023 • 25

Training & Architectures

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 55
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 8
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8, 2024 • 158
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 46

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs