ConstBERT
ConstBERT (Constant-Space BERT) is a multi-vector retrieval model designed for efficient and effective passage retrieval. It modifies the ColBERT architecture by encoding documents into a fixed number of learned embeddings, rather than one embedding per token. This approach significantly reduces storage costs and can improve OS paging management due to fixed-size document representations, while retaining most of the original effectiveness of multi-vector models.
Details
ConstBERT addresses the high storage cost associated with traditional multi-vector retrieval methods like ColBERT, where each token in a document collection is stored as a vector. Instead, ConstBERT proposes a learned pooling mechanism that projects the token-level embeddings of a document into a smaller, fixed number (C
) of document-level embeddings. Each of these C
embeddings captures distinct semantic facets of the document. This projection is achieved through an additional linear transformation layer learned end-to-end during training. The relevance score between a query and a document is then computed using a late interaction mechanism (MaxSim) over these C
document embeddings and the query's token embeddings.
This approach offers a trade-off between storage/computational efficiency and retrieval effectiveness, configurable by the choice of C
. The paper demonstrates that ConstBERT can achieve performance comparable to ColBERT on benchmarks like MSMARCO and BEIR, with substantially smaller index sizes.
This model has been trained to produce 32 vectors of size 128.
Model Sources
For more details, please refer to our official repository, paper and blog!
Direct Use
ConstBERT is intended for semantic search and passage retrieval tasks. It can be used for:
- First-stage retrieval in large document collections.
- Reranking candidates produced by another retrieval system.
The model produces fixed-size multi-vector representations for documents, which can be indexed efficiently. Queries are represented as sets of token embeddings.
Example code:
from transformers import AutoModel
import numpy as np
def max_sim(q: np.ndarray, d: np.ndarray) -> float:
# Ensure the dimensions are correct
assert q.ndim == 2, "Q must be a 2-dimensional array"
assert d.ndim == 2, "d must be a 2-dimensional array"
scores = np.dot(d, q.T)
max_scores = np.max(scores, axis=0)
return float(np.sum(max_scores))
model = AutoModel.from_pretrained("pinecone/ConstBERT", trust_remote_code=True)
# Example queries and documents
queries = ["What is the capital of France?", "latest advancements in AI"]
documents = [
"Paris is the capital and most populous city of France.",
"Artificial intelligence is rapidly evolving with new breakthroughs.",
"The Eiffel Tower is a famous landmark in Paris."
]
# Encode queries and documents
query_embeddings = model.encode_queries(queries).numpy()
document_embeddings = model.encode_documents(documents).numpy()
max_sim(query_embeddings[0], document_embeddings[0]) > max_sim(query_embeddings[0], document_embeddings[1])
# Returns: True
Citation (BibTeX)
@inproceedings{macavaney2025constbert,
author = {MacAvaney, Sean and Mallia, Antonio and Tonellotto, Nicola},
title = {Efficient Constant-Space Multi-vector Retrieval},
year = {2025},
isbn = {978-3-031-88713-0},
publisher = {Springer-Verlag},
address = {Berlin, Heidelberg},
url = {https://doi.org/10.1007/978-3-031-88714-7_22},
doi = {10.1007/978-3-031-88714-7_22},
booktitle = {Advances in Information Retrieval: 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6โ10, 2025, Proceedings, Part III},
pages = {237โ245},
numpages = {9},
keywords = {Multi-Vector Retrieval, Efficiency, Dense Retrieval},
location = {Lucca, Italy}
}
- Downloads last month
- 23
Model tree for pinecone/ConstBERT
Base model
google-bert/bert-base-uncased