LettuceDetect: Hallucination Detection Model
Model Name: lettucedect-base-modernbert-en-v1
Organization: KRLabsOrg
Github: https://github.com/KRLabsOrg/LettuceDetect
Overview
LettuceDetect is a transformer-based model for hallucination detection on context and answer pairs, designed for Retrieval-Augmented Generation (RAG) applications. This model is built on ModernBERT, which has been specifically chosen and trained becasue of its extended context support (up to 8192 tokens). This long-context capability is critical for tasks where detailed and extensive documents need to be processed to accurately determine if an answer is supported by the provided context.
This is our Large model based on ModernBERT-large
Model Details
- Architecture: ModernBERT (Large) with extended context support (up to 8192 tokens)
- Task: Token Classification / Hallucination Detection
- Training Dataset: RagTruth
- Language: English
How It Works
The model is trained to identify tokens in the answer text that are not supported by the given context. During inference, the model returns token-level predictions which are then aggregated into spans. This allows users to see exactly which parts of the answer are considered hallucinated.
Usage
Installation
Install the 'lettucedetect' repository
pip install lettucedetect
Using the model
from lettucedetect.models.inference import HallucinationDetector
# For a transformer-based approach:
detector = HallucinationDetector(
method="transformer", model_path="KRLabsOrg/lettucedect-base-modernbert-en-v1"
)
contexts = ["France is a country in Europe. The capital of France is Paris. The population of France is 67 million.",]
question = "What is the capital of France? What is the population of France?"
answer = "The capital of France is Paris. The population of France is 69 million."
# Get span-level predictions indicating which parts of the answer are considered hallucinated.
predictions = detector.predict(context=contexts, question=question, answer=answer, output_format="spans")
print("Predictions:", predictions)
# Predictions: [{'start': 31, 'end': 71, 'confidence': 0.9944414496421814, 'text': ' The population of France is 69 million.'}]
Details
We evaluate our model on the test set of the RAGTruth dataset. We evaluate both example-level (can we detect that a given answer contains hallucinations) and span-level (can we detect which parts of the answer are hallucinated).
The results on the example-level can be seen in the table below.
Our large model consistently achieves the highest scores across all data types and overall (lettucedetect-large-v1), the base model is also competitive across the benchmark (lettucedetect-base-v). We beat the previous best model (Finetuned LLAMA-2-13B) while being significantly smaller and faster (our models are 150M and 396M parameters, respectively, and able to process 30-60 examples per second on a single A100 GPU).
The other non-prompt based model is Luna which is also a token-level model but uses a DeBERTA-large encoder model. Our models are overall better that the Luna architecture (65.4 vs 76.07 F1 score for the base model on the overall data type).
The span-level results can be seen in the table below.
Our models achieve the best scores throughout each data-type and also overall, beating the previous best model (Finetuned LLAMA-2-13B) by a significant margin.
Citing
If you use the model or the tool, please cite the following:
@software{Kovacs:2025,
author = {Kovacs, Adam},
title = {LettuceDetect},
month = feb,
year = 2025,
publisher = {Zenodo},
doi = {10.5281/zenodo.14856505},
url = {https://doi.org/10.5281/zenodo.14856505},
}
- Downloads last month
- 7
Model tree for KRLabsOrg/lettucedect-base-modernbert-en-v1
Base model
answerdotai/ModernBERT-base