You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

misa-ai/Llama-3.2-1B-Instruct-Embedding-Base

This is a Embedding model for Document Retrieval: It maps sentences & paragraphs to a 2048 dimensional dense vector space and can be used for tasks like clustering or semantic search.

We train the model on a merged training dataset that consists of multiple domains, about 900k triplets in Vietnamese:

We use Llama-3.2-1B-Instruct as the pre-trained backbone.

This model directed to Document Retrieval.

Details:

  • Max support context size: 4096 tokens
  • Pooling last token (should use padding_side = "left")
  • Language: Vietnamese
  • Prompts:
    • Query: "Cho một câu truy vấn tìm kiếm thông tin, hãy truy xuất các tài liệu có liên quan trả lời cho truy vấn đó."
    • Document: ""

Please cite our manuscript if this dataset is used for your work

Downloads last month
0
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for misa-ai/Llama-3.2-1B-Instruct-Embedding-Base

Finetuned
(391)
this model