metadata

license: apache-2.0
language: en
tags:
  - sentence similarity
library_name: sentence-transformers
pipeline_tag: sentence-similarity

Dataset Collection:

The news dataset is collected from Kaggle.
The dataset has news title ,news content and the label(the label shows the cosine similarity between news title and news content).
Different strategies have been followed during the data gathering phase.

sentence transformer is fine-tuned for semantic search and sentence similarity

The model is fine-tuned on the dataset.
This model can be used for semantic search,sentence similarity,recommendation system.
This model can be used for the inference purpose as well.

Data Fields:

label: cosine similarity between news title and news content news title: The title of the news news content:The content of the news

Application:

This model is useful for the semantic search,sentence similarity,recommendation system.
You can fine-tune this model for your particular use cases.

Model Implementation

pip install -U sentence-transformers

from sentence_transformers import SentenceTransformer, InputExample, losses import pandas as pd from sentence_transformers import SentenceTransformer, InputExample from torch.utils.data import DataLoader from sentence_transformers import SentenceTransformer, util

model_name="Sakil/sentence_similarity_semantic_search"

sentences = ['A man is eating food.', 'A man is eating a piece of bread.', 'The girl is carrying a baby.', 'A man is riding a horse.', 'A woman is playing violin.', 'Two men pushed carts through the woods.', 'A man is riding a white horse on an enclosed ground.', 'A monkey is playing drums.', 'Someone in a gorilla costume is playing a set of drums.' ]

#Encode all sentences embeddings = model.encode(sentences)

#Compute cosine similarity between all pairs cos_sim = util.cos_sim(embeddings, embeddings)

#Add all pairs to a list with their cosine similarity score all_sentence_combinations = []

for i in range(len(cos_sim)-1):

for j in range(i+1, len(cos_sim)):

    all_sentence_combinations.append([cos_sim[i][j], i, j])

#Sort list by the highest cosine similarity score

all_sentence_combinations = sorted(all_sentence_combinations, key=lambda x: x[0], reverse=True)

print("Top-5 most similar pairs:")

for score, i, j in all_sentence_combinations[0:5]:

print("{} \t {} \t {:.4f}".format(sentences[i], sentences[j], cos_sim[i][j]))

Sakil
/

sentence_similarity_semantic_search