|
--- |
|
license: apache-2.0 |
|
language: en |
|
tags: |
|
- sentence similarity |
|
library_name: sentence-transformers |
|
pipeline_tag: sentence-similarity |
|
--- |
|
|
|
|
|
# Dataset Collection: |
|
* The news dataset is collected from Kaggle[dataset](https://www.kaggle.com/competitions/fake-news/data) |
|
* The dataset has news title ,news content and the label(the label shows the cosine similarity between news title and news content). |
|
* Different strategies have been followed during the data gathering phase. |
|
|
|
# sentence transformer is fine-tuned for semantic search and sentence similarity |
|
* The model is fine-tuned on the dataset. |
|
* This model can be used for semantic search,sentence similarity,recommendation system. |
|
* This model can be used for the inference purpose as well. |
|
|
|
# Data Fields: |
|
|
|
**label**: cosine similarity between news title and news content |
|
**news title**: The title of the news |
|
**news content**:The content of the news |
|
|
|
# Application: |
|
* This model is useful for the semantic search,sentence similarity,recommendation system. |
|
* You can fine-tune this model for your particular use cases. |
|
|
|
# Model Implementation |
|
|
|
# pip install -U sentence-transformers |
|
|
|
``` |
|
from sentence_transformers import SentenceTransformer, InputExample, losses |
|
import pandas as pd |
|
from sentence_transformers import SentenceTransformer, InputExample |
|
from torch.utils.data import DataLoader |
|
from sentence_transformers import SentenceTransformer, util |
|
|
|
model_name="Sakil/sentence_similarity_semantic_search" |
|
model = SentenceTransformer(model_name) |
|
sentences = ['A man is eating food.', |
|
'A man is eating a piece of bread.', |
|
'The girl is carrying a baby.', |
|
'A man is riding a horse.', |
|
'A woman is playing violin.', |
|
'Two men pushed carts through the woods.', |
|
'A man is riding a white horse on an enclosed ground.', |
|
'A monkey is playing drums.', |
|
'Someone in a gorilla costume is playing a set of drums.' |
|
] |
|
|
|
#Encode all sentences |
|
embeddings = model.encode(sentences) |
|
|
|
#Compute cosine similarity between all pairs |
|
cos_sim = util.cos_sim(embeddings, embeddings) |
|
|
|
#Add all pairs to a list with their cosine similarity score |
|
all_sentence_combinations = [] |
|
|
|
for i in range(len(cos_sim)-1): |
|
|
|
for j in range(i+1, len(cos_sim)): |
|
|
|
all_sentence_combinations.append([cos_sim[i][j], i, j]) |
|
|
|
#Sort list by the highest cosine similarity score |
|
|
|
all_sentence_combinations = sorted(all_sentence_combinations, key=lambda x: x[0], reverse=True) |
|
|
|
print("Top-5 most similar pairs:") |
|
|
|
for score, i, j in all_sentence_combinations[0:5]: |
|
|
|
print("{} \t {} \t {:.4f}".format(sentences[i], sentences[j], cos_sim[i][j])) |
|
``` |
|
|
|
|
|
# Github: [Sakil Ansari](https://github.com/Sakil786/sentence_similarity_semantic_search) |