BEIR-NL: Zero-shot Information Retrieval Benchmark for the Dutch Language Paper • 2412.08329 • Published Dec 11, 2024 • 1
BEIR-NL Collection Zero-shot Information Retrieval Benchmark for the Dutch Language • 16 items • Updated Feb 10 • 1
Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP Paper • 2408.04303 • Published Aug 8, 2024 • 20
view article Article How to generate text: using different decoding methods for language generation with Transformers Mar 1, 2020 • 170
Addition is All You Need for Energy-efficient Language Models Paper • 2410.00907 • Published Oct 1, 2024 • 146
Parallel Sentences Datasets Collection These datasets all have "english" and "non_english" columns for numerous datasets. They can be used to make embedding models multilingual. • 14 items • Updated 16 days ago • 15
An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published Jun 11, 2024 • 58