|
--- |
|
license: llama3.2 |
|
language: |
|
- vi |
|
base_model: |
|
- meta-llama/Llama-3.2-1B-Instruct |
|
pipeline_tag: sentence-similarity |
|
library_name: transformers |
|
--- |
|
|
|
# misa-ai/Llama-3.2-1B-Instruct-Embedding-Base |
|
|
|
This is a Embedding model for Document Retrieval: It maps sentences & paragraphs to a 2048 dimensional dense vector space and can be used for tasks like clustering or semantic search. |
|
|
|
We train the model on a merged training dataset that consists of multiple domains, about 900k triplets in Vietnamese: |
|
|
|
We use [Llama-3.2-1B-Instruct](https://github.com/meta-llama/Llama-3.2-1B-Instruct) as the pre-trained backbone. |
|
|
|
This model directed to Document Retrieval. |
|
|
|
Details: |
|
- Max support context size: 4096 tokens |
|
- Pooling last token (should use `padding_side = "left"`) |
|
- Language: Vietnamese |
|
- Prompts: |
|
- Query: "Cho một câu truy vấn tìm kiếm thông tin, hãy truy xuất các tài liệu có liên quan trả lời cho truy vấn đó." |
|
- Document: "" |
|
|
|
|
|
### Please cite our manuscript if this dataset is used for your work |
|
|
|
- Organization: MISA JSC |
|
- Author: [Sy-The Ho](https://huggingface.co/thehosy) |
|
|