
Releasing zeroentropy/zerank-1
In search enginers, rerankers are crucial for improving the accuracy of your retrieval system.
However, SOTA rerankers are closed-source and proprietary. At ZeroEntropy, we've trained a SOTA reranker outperforming closed-source competitors, and we're launching our model here on HuggingFace.
This reranker outperforms proprietary rerankers such as cohere-rerank-v3.5
and Salesforce/LlamaRank-v1
across a wide variety of domains, including finance, legal, code, STEM, medical, and conversational data.
At ZeroEntropy we've developed an innovative multi-stage pipeline that models query-document relevance scores as adjusted Elo ratings. See our Technical Report (Coming soon!) for more details.
Since we're a small company, this model is only released under a non-commercial license. If you'd like a commercial license, please contact us at [email protected] and we'll get you a license ASAP.
For this model's smaller twin, see zerank-1-small, which we've fully open-sourced under an Apache 2.0 License.
How to Use
from sentence_transformers import CrossEncoder
model = CrossEncoder("zeroentropy/zerank-1", trust_remote_code=True)
query_documents = [
("What is 2+2?", "4"),
("What is 2+2?", "The answer is definitely 1 million"),
]
scores = model.predict(query_documents)
print(scores)
The model can also be inferenced using ZeroEntropy's /models/rerank endpoint.
Evaluations
NDCG@10 scores between zerank-1
and competing closed-source proprietary rerankers. Since we are evaluating rerankers, OpenAI's text-embedding-3-small
is used as an initial retriever for the Top 100 candidate documents.
Task | Embedding | cohere-rerank-v3.5 | Salesforce/Llama-rank-v1 | zerank-1-small | zerank-1 |
---|---|---|---|---|---|
Code | 0.678 | 0.724 | 0.694 | 0.730 | 0.754 |
Conversational | 0.250 | 0.571 | 0.484 | 0.556 | 0.596 |
Finance | 0.839 | 0.824 | 0.828 | 0.861 | 0.894 |
Legal | 0.703 | 0.804 | 0.767 | 0.817 | 0.821 |
Medical | 0.619 | 0.750 | 0.719 | 0.773 | 0.796 |
STEM | 0.401 | 0.510 | 0.595 | 0.680 | 0.694 |
Comparing BM25 and Hybrid Search without and with zerank-1:
- Downloads last month
- 229