hotchpotch
/

vespa-onnx-BAAI-bge-m3-only-dense

Feature Extraction

text-embeddings-inference

Model card Files Files and versions Community

vespa-onnx-BAAI-bge-m3-only-dense / README.md

hotchpotch's picture

Update README.md

a084600 verified over 1 year ago

|

history blame contribute delete

2.63 kB

	---
	license: mit
	---

	Converted [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) model (dense retriever only) in onnx fp16/int8 format for use with [Vespa Embedding](https://docs.vespa.ai/en/embedding.html).

	- BAAI-bge-m3_fp16.onnx (fp16)
	- BAAI-bge-m3_quantized.onnx (int8 quantized)

	The model was quantized using the [optimum](https://github.com/huggingface/optimum) toolkit.

	## Example of vespa services.xml:

	Notice: FP16 works well with Vespa versions `8.325.46` and above.

	```xml
	<component id="bge_m3" type="hugging-face-embedder">
	<transformer-model
	url="https://huggingface.co/hotchpotch/vespa-onnx-BAAI-bge-m3-only-dense/resolve/main/BAAI-bge-m3_fp16.onnx" />
	<!-- or int8 quantization model
	<transformer-model
	url="https://huggingface.co/hotchpotch/vespa-onnx-BAAI-bge-m3-only-dense/resolve/main/BAAI-bge-m3_quantized.onnx"
	/>
	-->
	<tokenizer-model
	url="https://huggingface.co/hotchpotch/vespa-onnx-BAAI-bge-m3-only-dense/resolve/main/tokenizer.json" />
	<normalize>true</normalize>
	<pooling-strategy>cls</pooling-strategy>
	</component>
	```

	### deploy

	```
	# FP16 model has a larger file size, which can result in longer deployment times.
	vespa deploy --wait 1800 .
	```

	## Tips: conver to int8 quantized

	```
	# https://github.com/vespa-engine/sample-apps/blob/master/simple-semantic-search/export_hf_model_from_hf.py
	./export_hf_model_from_hf.py --hf_model BAAI/bge-m3 --output_dir bge-m3
	```

	```
	optimum-cli onnxruntime quantize --onnx_model ./bge-m3 -o bge-m3-large_quantized --avx512_vnni
	```

	## Tips: convert to fp16

	```
	# https://github.com/vespa-engine/sample-apps/blob/master/simple-semantic-search/export_hf_model_from_hf.py
	./export_hf_model_from_hf.py --hf_model BAAI/bge-m3 --output_dir bge-m3
	```

	- https://gist.github.com/hotchpotch/64fa52d32886fe61cc1d110066afef38

	```
	# https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/float16.py

	import onnx
	from onnxruntime.transformers.float16 import convert_float_to_float16

	onnx_model = onnx.load("bge-m3/BAAI-bge-m3.onnx")
	model_fp16 = convert_float_to_float16(onnx_model, disable_shape_infer=True)
	onnx.save(model_fp16, "bge-m3/BAAI-bge-m3_fp16.onnx")
	```

	## License

	The license for this model is based on the original license (found in the LICENSE file in the project's root directory), which is the MIT License.

	- https://huggingface.co/BAAI/bge-m3

	## Attribution

	All credits for this model go to the authors of BAAI/bge-m3 and the associated researchers and organizations. When using this model, please be sure to attribute the original authors.