File size: 2,752 Bytes

d4fafd0
 
f780954
 
 
a616977
0fc95dd
d4fafd0
f780954
9e5c66b
f780954
af313db
f780954
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aeca7d5
 
 
 
 
 
70f408d
aeca7d5
 
 
 
 
 
 
80eeeb8
aeca7d5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fe83fe7
aeca7d5
fe83fe7
aeca7d5
fe83fe7
aeca7d5
 
 
fe83fe7
aeca7d5
 
 
fe83fe7
aeca7d5
fe83fe7
aeca7d5
70f408d
aeca7d5
fe83fe7
9d4db31

---
license: apache-2.0
language: en
tags:
- sentence similarity
library_name: sentence-transformers
pipeline_tag: sentence-similarity
---


# Dataset Collection:
* The news dataset is collected from Kaggle[dataset](https://www.kaggle.com/competitions/fake-news/data)
* The dataset has news title ,news content and the label(the label shows the cosine similarity between news title and news content).
* Different strategies have been followed during the data gathering phase.

# sentence transformer is fine-tuned for semantic search and sentence similarity
* The model is fine-tuned on the dataset.
* This model can be used for semantic search,sentence similarity,recommendation system.
* This model can be used for the inference purpose as well.

# Data Fields:
 
**label**: cosine similarity between news title and news content
**news title**: The title of the news
**news content**:The content of the news

# Application:
* This model is useful for the semantic search,sentence similarity,recommendation system.
* You can fine-tune this model for your particular use cases.

# Model Implementation

# pip install -U sentence-transformers

```
from sentence_transformers import SentenceTransformer, InputExample, losses
import pandas as pd
from sentence_transformers import SentenceTransformer, InputExample
from torch.utils.data import DataLoader
from sentence_transformers import SentenceTransformer, util

model_name="Sakil/sentence_similarity_semantic_search"
model = SentenceTransformer(model_name)
sentences = ['A man is eating food.',
          'A man is eating a piece of bread.',
          'The girl is carrying a baby.',
          'A man is riding a horse.',
          'A woman is playing violin.',
          'Two men pushed carts through the woods.',
          'A man is riding a white horse on an enclosed ground.',
          'A monkey is playing drums.',
          'Someone in a gorilla costume is playing a set of drums.'
          ]

#Encode all sentences
embeddings = model.encode(sentences)

#Compute cosine similarity between all pairs
cos_sim = util.cos_sim(embeddings, embeddings)

#Add all pairs to a list with their cosine similarity score
all_sentence_combinations = []

for i in range(len(cos_sim)-1):

    for j in range(i+1, len(cos_sim)):
    
        all_sentence_combinations.append([cos_sim[i][j], i, j])

#Sort list by the highest cosine similarity score

all_sentence_combinations = sorted(all_sentence_combinations, key=lambda x: x[0], reverse=True)

print("Top-5 most similar pairs:")

for score, i, j in all_sentence_combinations[0:5]:

    print("{} \t {} \t {:.4f}".format(sentences[i], sentences[j], cos_sim[i][j]))
```


# Github: [Sakil Ansari](https://github.com/Sakil786/sentence_similarity_semantic_search)