SentenceTransformer based on BookingCare/bkcare-bert-pretrained

This is a sentence-transformers model finetuned from BookingCare/bkcare-bert-pretrained on the facebook/xnli dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BookingCare/bkcare-bert-pretrained
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Training Dataset:
  • **Languages:**vi

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("nampham1106/bkcare-text-emb-v1.0")
# Run inference
sentences = [
    'Tôi sẽ làm tất cả những gì ông muốn. julius hạ khẩu súng lục .',
    'Tôi sẽ ban cho anh những lời chúc của anh , julius bỏ súng xuống .',
    'Nó đến trong túi 400 pound .',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.6867
spearman_cosine 0.6701
pearson_manhattan 0.6734
spearman_manhattan 0.669
pearson_euclidean 0.6744
spearman_euclidean 0.6701
pearson_dot 0.6867
spearman_dot 0.6701
pearson_max 0.6867
spearman_max 0.6701

Semantic Similarity

Metric Value
pearson_cosine 0.6851
spearman_cosine 0.6686
pearson_manhattan 0.6727
spearman_manhattan 0.6683
pearson_euclidean 0.6739
spearman_euclidean 0.6695
pearson_dot 0.6803
spearman_dot 0.6631
pearson_max 0.6851
spearman_max 0.6695
Downloads last month
11
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for nampham1106/bkcare-embed-text-v1.0

Finetuned
(1)
this model

Dataset used to train nampham1106/bkcare-embed-text-v1.0

Evaluation results