misa-ai
/

Llama-3.2-1B-Instruct-Embedding-Base

Sentence Similarity

feature-extraction

Model card Files Files and versions Community

Llama-3.2-1B-Instruct-Embedding-Base / README.md

thehosy's picture

Update README.md

2682abe verified about 1 month ago

|

history blame contribute delete

1.12 kB

	---
	license: llama3.2
	language:
	- vi
	base_model:
	- meta-llama/Llama-3.2-1B-Instruct
	pipeline_tag: sentence-similarity
	library_name: transformers
	---

	# misa-ai/Llama-3.2-1B-Instruct-Embedding-Base

	This is a Embedding model for Document Retrieval: It maps sentences & paragraphs to a 2048 dimensional dense vector space and can be used for tasks like clustering or semantic search.

	We train the model on a merged training dataset that consists of multiple domains, about 900k triplets in Vietnamese:

	We use [Llama-3.2-1B-Instruct](https://github.com/meta-llama/Llama-3.2-1B-Instruct) as the pre-trained backbone.

	This model directed to Document Retrieval.

	Details:
	- Max support context size: 4096 tokens
	- Pooling last token (should use `padding_side = "left"`)
	- Language: Vietnamese
	- Prompts:
	- Query: "Cho một câu truy vấn tìm kiếm thông tin, hãy truy xuất các tài liệu có liên quan trả lời cho truy vấn đó."
	- Document: ""


	### Please cite our manuscript if this dataset is used for your work

	- Organization: MISA JSC
	- Author: [Sy-The Ho](https://huggingface.co/thehosy)