EverGreen
Collection
3 items
•
Updated
A lightweight multilingual model for temporal classification of questions, fine-tuned from intfloat/multilingual-e5-small.
E5-EG-small (E5 EverGreen - Small) is an efficient multilingual text classification model that determines whether questions have temporally mutable or immutable answers. This model offers a balanced trade-off between performance and computational efficiency.
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import time
# Load model and tokenizer
model_name = "s-nlp/E5-EG-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# For optimal performance, use GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
model.eval()
# Batch classification example
questions = [
"What is the capital of France?",
"Who won the latest World Cup?",
"What is the speed of light?",
"What is the current Bitcoin price?"
]
# Tokenize all questions
inputs = tokenizer(
questions,
return_tensors="pt",
padding=True,
truncation=True,
max_length=64
).to(device)
# Classify
start_time = time.time()
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_classes = torch.argmax(predictions, dim=-1)
inference_time = (time.time() - start_time) * 1000 # ms
# Display results
class_names = ["Immutable", "Mutable"]
for i, question in enumerate(questions):
print(f"Q: {question}")
print(f" Classification: {class_names[predicted_classes[i].item()]}")
print(f" Confidence: {predictions[i][predicted_classes[i]].item():.2f}")
print(f"\nTotal inference time: {inference_time:.2f}ms")
print(f"Average per question: {inference_time/len(questions):.2f}ms")
Same multilingual dataset as E5-EG-large:
Same test sets as E5-EG-large (2100 samples per language).
Language | F1 Score | Δ vs Large |
---|---|---|
English | 0.88 | -0.04 |
Chinese | 0.87 | -0.04 |
French | 0.86 | -0.04 |
German | 0.85 | -0.04 |
Russian | 0.84 | -0.04 |
Hebrew | 0.83 | -0.04 |
Arabic | 0.82 | -0.04 |
Class | Precision | Recall | F1 |
---|---|---|---|
Immutable | 0.83 | 0.86 | 0.84 |
Mutable | 0.86 | 0.83 | 0.84 |
Metric | E5-EG-small | E5-EG-large | Improvement |
---|---|---|---|
Parameters | 118M | 560M | 4.7x smaller |
Model Size (MB) | 471 | 2,240 | 4.8x smaller |
Inference Time (ms) | 12 | 45 | 3.8x faster |
Memory Usage (GB) | 0.8 | 3.2 | 4x less |
Throughput (samples/sec) | 83 | 22 | 3.8x higher |
BibTeX:
@misc{pletenev2025truetomorrowmultilingualevergreen,
title={Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA},
author={Sergey Pletenev and Maria Marina and Nikolay Ivanov and Daria Galimzianova and Nikita Krayko and Mikhail Salnikov and Vasily Konovalov and Alexander Panchenko and Viktor Moskvoretskii},
year={2025},
eprint={2505.21115},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.21115},
}
Base model
intfloat/multilingual-e5-small