SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 7 days ago • 153
Token-level and sequence-level loss smoothing for RNN language models Paper • 1805.05062 • Published May 14, 2018
Efficient Wait-k Models for Simultaneous Machine Translation Paper • 2005.08595 • Published May 18, 2020
Added Toxicity Mitigation at Inference Time for Multimodal and Massively Multilingual Translation Paper • 2311.06532 • Published Nov 11, 2023
Large Concept Models: Language Modeling in a Sentence Representation Space Paper • 2412.08821 • Published Dec 11, 2024 • 14