semantic-search-with-retrieve-and-rerank

Sleeping

App Files Files Community

nickmuchi commited on May 5, 2022

Commit

d9e6ed8

1 Parent(s): 7f6b4d5

Update app.py

Browse files

Files changed (1) hide show

app.py +5 -4

app.py CHANGED Viewed

@@ -17,6 +17,7 @@ import validators
 import nltk
 import warnings
 import streamlit as st
 nltk.download('punkt')
@@ -225,9 +226,9 @@ bi_encoder_type = st.sidebar.selectbox(
 top_k = st.sidebar.slider("Number of Top Hits Generated",min_value=1,max_value=5,value=2)
 st.markdown(
-    """-The app supports asymmetric Semantic search which seeks to improve search accuracy of documents/URL by understanding the content of the search query in contrast to traditional search engines which only find documents based on lexical matches.
-    -The idea behind semantic search is to embed all entries in your corpus, whether they be sentences, paragraphs, or documents, into a vector space. At search time, the query is embedded into the same vector space and the closest embeddings from your corpus are found. These entries should have a high semantic overlap with the query.
-    -The all-* models where trained on all available training data (more than 1 billion training pairs) and are designed as general purpose models. The all-mpnet-base-v2 model provides the best quality, while all-MiniLM-L6-v2 is 5 times faster and still offers good quality. The models used have been trained on broad datasets, however, if your document/corpus is specialised, such as for science or economics, the results returned might be unsatisfactory.""")
 st.markdown("""There models available to choose from:""")
@@ -247,7 +248,7 @@ st.markdown(
     """- Bi-Encoder (Retrieval): The Bi-encoder is responsible for independently embedding the sentences and search queries into a vector space. The result is then passed to the cross-encoder for checking the relevance/similarity between the query and sentences.
     - Cross-Encoder (Re-Ranker): A re-ranker based on a Cross-Encoder can substantially improve the final results for the user. The query and a possible document is passed simultaneously to transformer network, which then outputs a single score between 0 and 1 indicating how relevant the document is for the given query. The cross-encoder further boost the performance, especially when you search over a corpus for which the bi-encoder was not trained for.""")
-st.image('encoder.png', caption='Retrieval and Re-Rank')
 st.markdown("""
     In order to use the app:

 import nltk
 import warnings
 import streamlit as st
+from PIL import Image
 nltk.download('punkt')
 top_k = st.sidebar.slider("Number of Top Hits Generated",min_value=1,max_value=5,value=2)
 st.markdown(
+    """- The app supports asymmetric Semantic search which seeks to improve search accuracy of documents/URL by understanding the content of the search query in contrast to traditional search engines which only find documents based on lexical matches.
+    - The idea behind semantic search is to embed all entries in your corpus, whether they be sentences, paragraphs, or documents, into a vector space. At search time, the query is embedded into the same vector space and the closest embeddings from your corpus are found. These entries should have a high semantic overlap with the query.
+    - The all-* models where trained on all available training data (more than 1 billion training pairs) and are designed as general purpose models. The all-mpnet-base-v2 model provides the best quality, while all-MiniLM-L6-v2 is 5 times faster and still offers good quality. The models used have been trained on broad datasets, however, if your document/corpus is specialised, such as for science or economics, the results returned might be unsatisfactory.""")
 st.markdown("""There models available to choose from:""")
     """- Bi-Encoder (Retrieval): The Bi-encoder is responsible for independently embedding the sentences and search queries into a vector space. The result is then passed to the cross-encoder for checking the relevance/similarity between the query and sentences.
     - Cross-Encoder (Re-Ranker): A re-ranker based on a Cross-Encoder can substantially improve the final results for the user. The query and a possible document is passed simultaneously to transformer network, which then outputs a single score between 0 and 1 indicating how relevant the document is for the given query. The cross-encoder further boost the performance, especially when you search over a corpus for which the bi-encoder was not trained for.""")
+st.image(Image.open('encoder.png'), caption='Retrieval and Re-Rank')
 st.markdown("""
     In order to use the app: