Massive Text Embedding Benchmark

non-profit

https://github.com/embeddings-benchmark

embeddings-benchmark

Activity Feed

AI & ML interests

Massive Text Embeddings Benchmark

Recent Activity

Muennighoff updated a dataset 8 minutes ago

mteb/arena-results

AdnanElAssadi updated a dataset about 2 hours ago

mteb/BIRCO-Relic-Test

AdnanElAssadi published a dataset about 2 hours ago

mteb/BIRCO-Relic-Test

View all activity

mteb's activity

Muennighoff

updated a dataset 8 minutes ago

mteb/arena-results

Viewer • Updated 8 minutes ago • 3.61k • 7.72k • 4

AdnanElAssadi

updated a dataset about 2 hours ago

mteb/BIRCO-Relic-Test

Viewer • Updated about 2 hours ago • 10.2k

AdnanElAssadi

published a dataset about 2 hours ago

mteb/BIRCO-Relic-Test

Viewer • Updated about 2 hours ago • 10.2k

AdnanElAssadi

updated a dataset about 2 hours ago

mteb/BIRCO-WTB-Test

Viewer • Updated about 2 hours ago • 6.91k

AdnanElAssadi

published a dataset about 2 hours ago

mteb/BIRCO-WTB-Test

Viewer • Updated about 2 hours ago • 6.91k

AdnanElAssadi

updated a dataset about 2 hours ago

mteb/BIRCO-ClinicalTrial-Test

Viewer • Updated about 2 hours ago • 7.02k

AdnanElAssadi

published a dataset about 2 hours ago

mteb/BIRCO-ClinicalTrial-Test

Viewer • Updated about 2 hours ago • 7.02k

AdnanElAssadi

updated a dataset about 2 hours ago

mteb/BIRCO-Arguana-Test

Viewer • Updated about 2 hours ago • 8.22k

AdnanElAssadi

published a dataset about 2 hours ago

mteb/BIRCO-Arguana-Test

Viewer • Updated about 2 hours ago • 8.22k

AdnanElAssadi

updated a dataset about 2 hours ago

mteb/BIRCO-DorisMae-Test

Viewer • Updated about 2 hours ago • 12.2k

AdnanElAssadi

published a dataset about 2 hours ago

mteb/BIRCO-DorisMae-Test

Viewer • Updated about 2 hours ago • 12.2k

mmhamdy

posted an update about 19 hours ago

Post

1475

⛓ Evaluating Long Context #2: SCROLLS and ZeroSCROLLS

In this series of posts about tracing the history of long context evaluation, we started with Long Range Arena (LRA). Introduced in 2020, Long Range Arens (LRA) is one of the earliest benchmarks designed to tackle the challenge of long context evaluation. But it wasn't introduced to evaluate LLMs, but rather the transformer architecture in general.

📜 The SCROLLS benchmark, introduced in 2022, addresses this gap in NLP/LLM research. SCROLLS challenges models with tasks that require reasoning over extended sequences (according to 2022 standards). So, what does it offer?

1️⃣ Long Text Focus: SCROLLS (unlike LRA) focus mainly on text and contain inputs with thousands of words, testing models' ability to synthesize information across lengthy documents.
2️⃣ Diverse Tasks: Includes summarization, question answering, and natural language inference across domains like literature, science, and business.
3️⃣ Unified Format: All datasets are available in a text-to-text format, facilitating easy evaluation and comparison of models.

Building on SCROLLS, ZeroSCROLLS takes long text evaluation to the next level by focusing on zero-shot learning. Other features include:

1️⃣ New Tasks: Introduces tasks like sentiment aggregation and sorting book chapter summaries.
2️⃣ Leaderboard: A live leaderboard encourages continuous improvement and competition among researchers.

💡 What are some other landmark benchmarks in the history of long context evaluation? Feel free to share your thoughts and suggestions in the comments.

- SCROLLS Paper: SCROLLS: Standardized CompaRison Over Long Language Sequences (2201.03533)
- ZeroSCROLLS Paper: ZeroSCROLLS: A Zero-Shot Benchmark for Long Text Understanding (2305.14196)

swj0419

authored a paper 9 days ago

s1: Simple test-time scaling

Paper • 2501.19393 • Published 11 days ago • 98

Muennighoff

authored a paper 9 days ago

s1: Simple test-time scaling

Paper • 2501.19393 • Published 11 days ago • 98

tomaarsen

posted an update 19 days ago

Post

1754

I just released Sentence Transformers v3.4.0, featuring a memory leak fix, compatibility between the powerful Cached... losses and the Matryoshka loss modifier, and a bunch of fixes & small features.

🪆 Matryoshka & Cached loss compatibility
It is now possible to combine the powerful Cached... losses (which use in-batch negatives & a caching mechanism to allow for endless batch size & negatives) with the Matryoshka loss modifier which modifies a base loss such that it is trained not only on the maximum dimensionality (e.g. 1024 dimensions), but also on many lower dimensions (e.g. 768, 512, 256, 128, 64, 32).
After training, these models' embeddings can be truncated for faster retrieval, etc.

🎞️ Resolve memory leak when Model and Trainer are reinitialized
Due to a circular dependency between Trainer -> Model -> ModelCardData -> Trainer, deleting both the trainer & model still didn't free up the memory.
This led to a memory leak in scripts where you repeatedly do so.

➕ New Features
Many new small features, e.g. multi-GPU support for 'mine_hard_negatives', a 'margin' parameter to TripletEvaluator, and Matthews Correlation Coefficient in the BinaryClassificationEvaluator.

🐛 Bug Fixes
Also a bunch of fixes, for example that subsequent batches were not sorted when using the "no_duplicates" batch sampler. See the release notes for more details.

Full release notes: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.4.0

Big thanks to all community members who assisted in this release. 10 folks with their first contribution this time around!

sarahooker

authored 5 papers 23 days ago

Multilingual Arbitrage: Optimizing Data Pools to Accelerate Multilingual Progress

Paper • 2408.14960 • Published Aug 27, 2024

Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning

Paper • 2410.10801 • Published Oct 14, 2024

INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge

Paper • 2411.19799 • Published Nov 29, 2024 • 11

Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation

Paper • 2412.03304 • Published Dec 4, 2024 • 17

IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models

Paper • 2406.03368 • Published Jun 5, 2024

AI & ML interests

Recent Activity

Team members 34

mteb's activity