Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2302.13971

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

about 20 hours ago

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 17
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 1

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 55
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 17
RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 7
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 14

Source papers of LLM Giants

Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 35
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Paper • 2311.07919 • Published Nov 14, 2023 • 10
Qwen2 Technical Report

Paper • 2407.10671 • Published Jul 15, 2024 • 161
Qwen2-Audio Technical Report

Paper • 2407.10759 • Published Jul 15, 2024 • 57

Running on CPU Upgrade

2.03k

2.03k

Anychat

🏢

Display code snippets for different chat providers
Running

267

267

Qwen2.5 Coder Artifacts

🐢

Generate application code with Qwen2.5-Coder-32B
Running

906

906

QwQ-32B-Preview

🔍

QwQ-32B-Preview
Running on CPU Upgrade

12.8k

12.8k

Open LLM Leaderboard

🏆

Track, rank and evaluate open LLMs and chatbots

The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only

Paper • 2306.01116 • Published Jun 1, 2023 • 34
HuggingFaceFW/fineweb

Viewer • Updated Jan 31 • 25B • 301k • 2.03k
tiiuae/falcon-refinedweb

Viewer • Updated Jun 20, 2023 • 968M • 53.5k • 841
cerebras/SlimPajama-627B

Preview • Updated Jul 7, 2023 • 87.3k • 457

LLM Tech Report

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 352
Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18, 2024 • 142
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement

Paper • 2409.12122 • Published Sep 18, 2024 • 3
Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published 24 days ago • 164

Finished Reading

Self-Play Preference Optimization for Language Model Alignment

Paper • 2405.00675 • Published May 1, 2024 • 27
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Paper • 2205.14135 • Published May 27, 2022 • 13
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 55
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 8

text-generation-webui space

CohereForAI/c4ai-command-r-plus

Text Generation • Updated Sep 27, 2024 • 3.71k • 1.72k
LLaMA: Open and Efficient Foundation Language Models

Paper • 2302.13971 • Published Feb 27, 2023 • 14

LLaMA: Open and Efficient Foundation Language Models

Paper • 2302.13971 • Published Feb 27, 2023 • 14
Efficient Tool Use with Chain-of-Abstraction Reasoning

Paper • 2401.17464 • Published Jan 30, 2024 • 20
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

Paper • 2407.21770 • Published Jul 31, 2024 • 22
Agent-as-a-Judge: Evaluate Agents with Agents

Paper • 2410.10934 • Published Oct 14, 2024 • 19

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 55
LLaMA: Open and Efficient Foundation Language Models

Paper • 2302.13971 • Published Feb 27, 2023 • 14
Efficient Tool Use with Chain-of-Abstraction Reasoning

Paper • 2401.17464 • Published Jan 30, 2024 • 20
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

Paper • 2407.21770 • Published Jul 31, 2024 • 22

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs