misa-ai
/

Llama-3.2-1B-Instruct-Embedding-Base

Sentence Similarity

feature-extraction

Model card Files Files and versions Community

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

misa-ai/Llama-3.2-1B-Instruct-Embedding-Base

This is a Embedding model for Document Retrieval: It maps sentences & paragraphs to a 2048 dimensional dense vector space and can be used for tasks like clustering or semantic search.

We train the model on a merged training dataset that consists of multiple domains, about 900k triplets in Vietnamese:

We use Llama-3.2-1B-Instruct as the pre-trained backbone.

This model directed to Document Retrieval.

Details:

Max support context size: 4096 tokens
Pooling last token (should use padding_side = "left")
Language: Vietnamese
Prompts:
- Query: "Cho một câu truy vấn tìm kiếm thông tin, hãy truy xuất các tài liệu có liên quan trả lời cho truy vấn đó."
- Document: ""

Please cite our manuscript if this dataset is used for your work

Organization: MISA JSC
Author: Sy-The Ho

Downloads last month: 0

Inference Providers NEW

Sentence Similarity

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for misa-ai/Llama-3.2-1B-Instruct-Embedding-Base

Base model

meta-llama/Llama-3.2-1B-Instruct

Finetuned

(391)

this model