Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2103.06874

Papers - Training - Token Free - Bytes or Characters

ByT5: Towards a token-free future with pre-trained byte-to-byte models

Paper • 2105.13626 • Published May 28, 2021 • 3
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation

Paper • 2103.06874 • Published Mar 11, 2021 • 1

Papers - Pre-training - Encoders - Bert

Leveraging Pre-trained Checkpoints for Sequence Generation Tasks

Paper • 1907.12461 • Published Jul 29, 2019 • 1
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation

Paper • 2103.06874 • Published Mar 11, 2021 • 1

Papers - Encoders

Functional Interpolation for Relative Positions Improves Long Context Transformers

Paper • 2310.04418 • Published Oct 6, 2023 • 4
SPBERT: An Efficient Pre-training BERT on SPARQL Queries for Question Answering over Knowledge Graphs

Paper • 2106.09997 • Published Jun 18, 2021 • 2
Neural Machine Translation of Rare Words with Subword Units

Paper • 1508.07909 • Published Aug 31, 2015 • 4
A Multimodal Approach to Device-Directed Speech Detection with Large Language Models

Paper • 2403.14438 • Published Mar 21, 2024 • 2

Papers - Multilingual

A Biomedical Entity Extraction Pipeline for Oncology Health Records in Portuguese

Paper • 2304.08999 • Published Apr 18, 2023 • 2
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages

Paper • 2309.09400 • Published Sep 17, 2023 • 85
Robust Open-Vocabulary Translation from Visual Text Representations

Paper • 2104.08211 • Published Apr 16, 2021 • 1
Poro 34B and the Blessing of Multilinguality

Paper • 2404.01856 • Published Apr 2, 2024 • 15

Papers - Training

SELF: Language-Driven Self-Evolution for Large Language Model

Paper • 2310.00533 • Published Oct 1, 2023 • 2
GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length

Paper • 2310.00576 • Published Oct 1, 2023 • 2
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity

Paper • 2305.13169 • Published May 22, 2023 • 3
Transformers Can Achieve Length Generalization But Not Robustly

Paper • 2402.09371 • Published Feb 14, 2024 • 15

Papers - Pre-training

Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning

Paper • 2310.20587 • Published Oct 31, 2023 • 18
Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15, 2024 • 105
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

Paper • 2403.15042 • Published Mar 22, 2024 • 27
LIMA: Less Is More for Alignment

Paper • 2305.11206 • Published May 18, 2023 • 23

google/canine-c

Feature Extraction • Updated Apr 29, 2024 • 167k • • 32
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation

Paper • 2103.06874 • Published Mar 11, 2021 • 1

ByT5: Towards a token-free future with pre-trained byte-to-byte models

Paper • 2105.13626 • Published May 28, 2021 • 3
Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29, 2024 • 51
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

Paper • 2305.07185 • Published May 12, 2023 • 9
Byte-Level Recursive Convolutional Auto-Encoder for Text

Paper • 1802.01817 • Published Feb 6, 2018

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

Paper • 2310.05737 • Published Oct 9, 2023 • 4
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models

Paper • 2308.16692 • Published Aug 31, 2023 • 1
Towards General Text Embeddings with Multi-stage Contrastive Learning

Paper • 2308.03281 • Published Aug 7, 2023 • 2
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings

Paper • 2305.11554 • Published May 19, 2023 • 2

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs