ConstBERT

ConstBERT (Constant-Space BERT) is a multi-vector retrieval model designed for efficient and effective passage retrieval. It modifies the ColBERT architecture by encoding documents into a fixed number of learned embeddings, rather than one embedding per token. This approach significantly reduces storage costs and can improve OS paging management due to fixed-size document representations, while retaining most of the original effectiveness of multi-vector models.

Details

ConstBERT addresses the high storage cost associated with traditional multi-vector retrieval methods like ColBERT, where each token in a document collection is stored as a vector. Instead, ConstBERT proposes a learned pooling mechanism that projects the token-level embeddings of a document into a smaller, fixed number (C) of document-level embeddings. Each of these C embeddings captures distinct semantic facets of the document. This projection is achieved through an additional linear transformation layer learned end-to-end during training. The relevance score between a query and a document is then computed using a late interaction mechanism (MaxSim) over these C document embeddings and the query's token embeddings.

This approach offers a trade-off between storage/computational efficiency and retrieval effectiveness, configurable by the choice of C. The paper demonstrates that ConstBERT can achieve performance comparable to ColBERT on benchmarks like MSMARCO and BEIR, with substantially smaller index sizes.

This model has been trained to produce 32 vectors of size 128.

Model Sources

For more details, please refer to our official repository, paper and blog!

Direct Use

ConstBERT is intended for semantic search and passage retrieval tasks. It can be used for:

  • First-stage retrieval in large document collections.
  • Reranking candidates produced by another retrieval system.

The model produces fixed-size multi-vector representations for documents, which can be indexed efficiently. Queries are represented as sets of token embeddings.

Example code:

from transformers import AutoModel
import numpy as np

def max_sim(q: np.ndarray, d: np.ndarray) -> float:
    # Ensure the dimensions are correct
    assert q.ndim == 2, "Q must be a 2-dimensional array"
    assert d.ndim == 2, "d must be a 2-dimensional array"
    scores = np.dot(d, q.T)
    max_scores = np.max(scores, axis=0)
    return float(np.sum(max_scores))

model = AutoModel.from_pretrained("pinecone/ConstBERT", trust_remote_code=True)

# Example queries and documents
queries = ["What is the capital of France?", "latest advancements in AI"]
documents = [
    "Paris is the capital and most populous city of France.",
    "Artificial intelligence is rapidly evolving with new breakthroughs.",
    "The Eiffel Tower is a famous landmark in Paris."
]

# Encode queries and documents
query_embeddings = model.encode_queries(queries).numpy()
document_embeddings = model.encode_documents(documents).numpy()

max_sim(query_embeddings[0], document_embeddings[0]) > max_sim(query_embeddings[0], document_embeddings[1])
# Returns: True

Citation (BibTeX)

@inproceedings{macavaney2025constbert,
author = {MacAvaney, Sean and Mallia, Antonio and Tonellotto, Nicola},
title = {Efficient Constant-Space Multi-vector Retrieval},
year = {2025},
isbn = {978-3-031-88713-0},
publisher = {Springer-Verlag},
address = {Berlin, Heidelberg},
url = {https://doi.org/10.1007/978-3-031-88714-7_22},
doi = {10.1007/978-3-031-88714-7_22},
booktitle = {Advances in Information Retrieval: 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6โ€“10, 2025, Proceedings, Part III},
pages = {237โ€“245},
numpages = {9},
keywords = {Multi-Vector Retrieval, Efficiency, Dense Retrieval},
location = {Lucca, Italy}
}
Downloads last month
23
Safetensors
Model size
110M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for pinecone/ConstBERT

Finetuned
(5244)
this model