|
--- |
|
|
|
|
|
tags: |
|
- FlagEmbedding |
|
- Embedding |
|
- Hybrid Retrieval |
|
- ONNX |
|
- Optimum |
|
- ONNXRuntime |
|
- Multilingual |
|
license: mit |
|
base_model: BAAI/bge-m3 |
|
--- |
|
|
|
# Model Card for philipchung/bge-m3-onnx |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
This is the [BAAI/BGE-M3](https://huggingface.co/BAAI/bge-m3) inference model converted to ONNX format and can be used with Optimum ONNX Runtime with CPU acceleration. This model outputs all 3 embedding types (Dense, Sparse, ColBERT). |
|
|
|
No ONNX optimizations are applied to this model. If you want to apply optimizations, use the export script included in this repo to generate a version of ONNX model with optimizations. |
|
|
|
Some of the code is adapted from [aapot/bge-m3-onnx](https://huggingface.co/aapot/bge-m3-onnx). The model in this repo inherits from `PretrainedModel` and the ONNX model can be downloaded from Huggingface Hub and used directly with the `model.from_pretrained()` method. |
|
|
|
## How to Use |
|
|
|
```python |
|
from collections import defaultdict |
|
from typing import Any |
|
|
|
import numpy as np |
|
from optimum.onnxruntime import ORTModelForCustomTasks |
|
from transformers import AutoTokenizer |
|
|
|
# Download ONNX model from Huggingface Hub |
|
onnx_model = ORTModelForCustomTasks.from_pretrained("philipchung/bge-m3-onnx") |
|
tokenizer = AutoTokenizer.from_pretrained("philipchung/bge-m3-onnx") |
|
# Inference forward pass |
|
sentences = ["First test sentence.", "Second test sentence"] |
|
inputs = tokenizer( |
|
sentences, |
|
padding="longest", |
|
return_tensors="np", |
|
) |
|
outputs = onnx_model.forward(**inputs) |
|
|
|
def process_token_weights( |
|
token_weights: np.ndarray, input_ids: list |
|
) -> defaultdict[Any, int]: |
|
"""Convert sparse token weights into dictionary of token indices and corresponding weights. |
|
|
|
Function is taken from the original FlagEmbedding.bge_m3.BGEM3FlagModel from the |
|
_process_token_weights() function defined within the encode() method. |
|
""" |
|
# convert to dict |
|
result = defaultdict(int) |
|
unused_tokens = set( |
|
[ |
|
tokenizer.cls_token_id, |
|
tokenizer.eos_token_id, |
|
tokenizer.pad_token_id, |
|
tokenizer.unk_token_id, |
|
] |
|
) |
|
for w, idx in zip(token_weights, input_ids, strict=False): |
|
if idx not in unused_tokens and w > 0: |
|
idx = str(idx) |
|
# w = int(w) |
|
if w > result[idx]: |
|
result[idx] = w |
|
return result |
|
|
|
# Each sentence results in a dict[str, list]float] | dict[str, float] | list[list[float]]] which corresponds to a dict with dense, sparse, and colbert embeddings. |
|
embeddings_list = [] |
|
for input_ids, dense_vec, sparse_vec, colbert_vec in zip( |
|
inputs["input_ids"], |
|
outputs["dense_vecs"], |
|
outputs["sparse_vecs"], |
|
outputs["colbert_vecs"], |
|
strict=False, |
|
): |
|
# Convert token weights into dictionary of token indices and corresponding weights |
|
token_weights = sparse_vec.astype(float).squeeze(-1) |
|
sparse_embeddings = process_token_weights( |
|
token_weights, |
|
input_ids.tolist(), |
|
) |
|
multivector_embedding = { |
|
"dense": dense_vec.astype(float).tolist(), # (1024) |
|
"sparse": dict(sparse_embeddings), # dict[token_index, weight] |
|
"colbert": colbert_vec.astype(float).tolist(), # (token len, 1024) |
|
} |
|
embeddings_list.append(multivector_embedding) |
|
``` |
|
|