philipchung
/

bge-m3-onnx

Feature Extraction

Hybrid Retrieval

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

bge-m3-onnx / README.md

philipchung's picture

Upload README.md with huggingface_hub

47bfcff verified 11 months ago

|

history blame contribute delete

3.52 kB

	---
	# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
	# Doc / guide: https://huggingface.co/docs/hub/model-cards
	tags:
	- FlagEmbedding
	- Embedding
	- Hybrid Retrieval
	- ONNX
	- Optimum
	- ONNXRuntime
	- Multilingual
	license: mit
	base_model: BAAI/bge-m3
	---

	# Model Card for philipchung/bge-m3-onnx

	<!-- Provide a quick summary of what the model is/does. -->

	This is the [BAAI/BGE-M3](https://huggingface.co/BAAI/bge-m3) inference model converted to ONNX format and can be used with Optimum ONNX Runtime with CPU acceleration. This model outputs all 3 embedding types (Dense, Sparse, ColBERT).

	No ONNX optimizations are applied to this model. If you want to apply optimizations, use the export script included in this repo to generate a version of ONNX model with optimizations.

	Some of the code is adapted from [aapot/bge-m3-onnx](https://huggingface.co/aapot/bge-m3-onnx). The model in this repo inherits from `PretrainedModel` and the ONNX model can be downloaded from Huggingface Hub and used directly with the `model.from_pretrained()` method.

	## How to Use

	```python
	from collections import defaultdict
	from typing import Any

	import numpy as np
	from optimum.onnxruntime import ORTModelForCustomTasks
	from transformers import AutoTokenizer

	# Download ONNX model from Huggingface Hub
	onnx_model = ORTModelForCustomTasks.from_pretrained("philipchung/bge-m3-onnx")
	tokenizer = AutoTokenizer.from_pretrained("philipchung/bge-m3-onnx")
	# Inference forward pass
	sentences = ["First test sentence.", "Second test sentence"]
	inputs = tokenizer(
	sentences,
	padding="longest",
	return_tensors="np",
	)
	outputs = onnx_model.forward(**inputs)

	def process_token_weights(
	token_weights: np.ndarray, input_ids: list
	) -> defaultdict[Any, int]:
	"""Convert sparse token weights into dictionary of token indices and corresponding weights.

	Function is taken from the original FlagEmbedding.bge_m3.BGEM3FlagModel from the
	_process_token_weights() function defined within the encode() method.
	"""
	# convert to dict
	result = defaultdict(int)
	unused_tokens = set(
	[
	tokenizer.cls_token_id,
	tokenizer.eos_token_id,
	tokenizer.pad_token_id,
	tokenizer.unk_token_id,
	]
	)
	for w, idx in zip(token_weights, input_ids, strict=False):
	if idx not in unused_tokens and w > 0:
	idx = str(idx)
	# w = int(w)
	if w > result[idx]:
	result[idx] = w
	return result

	# Each sentence results in a dict[str, list]float] \| dict[str, float] \| list[list[float]]] which corresponds to a dict with dense, sparse, and colbert embeddings.
	embeddings_list = []
	for input_ids, dense_vec, sparse_vec, colbert_vec in zip(
	inputs["input_ids"],
	outputs["dense_vecs"],
	outputs["sparse_vecs"],
	outputs["colbert_vecs"],
	strict=False,
	):
	# Convert token weights into dictionary of token indices and corresponding weights
	token_weights = sparse_vec.astype(float).squeeze(-1)
	sparse_embeddings = process_token_weights(
	token_weights,
	input_ids.tolist(),
	)
	multivector_embedding = {
	"dense": dense_vec.astype(float).tolist(), # (1024)
	"sparse": dict(sparse_embeddings), # dict[token_index, weight]
	"colbert": colbert_vec.astype(float).tolist(), # (token len, 1024)
	}
	embeddings_list.append(multivector_embedding)
	```