nickmuchi commited on
Commit
a8949e8
·
1 Parent(s): 5f48c45

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +15 -18
app.py CHANGED
@@ -225,26 +225,27 @@ bi_encoder_type = st.sidebar.selectbox(
225
  top_k = st.sidebar.slider("Number of Top Hits Generated",min_value=1,max_value=5,value=2)
226
 
227
  st.markdown(
228
- """The app supports asymmetric Semantic search which seeks to improve search accuracy of documents/URL by understanding the content of the search query in contrast to traditional search engines which only find documents based on lexical matches, semantic search can also find synonyms.
229
- The idea behind semantic search is to embed all entries in your corpus, whether they be sentences, paragraphs, or documents, into a vector space. At search time, the query is embedded into the same vector space and the closest embeddings from your corpus are found. These entries should have a high semantic overlap with the query.
230
- The all-* models where trained on all available training data (more than 1 billion training pairs) and are designed as general purpose models. The all-mpnet-base-v2 model provides the best quality, while all-MiniLM-L6-v2 is 5 times faster and still offers good quality. The models used have been trained on broad datasets, however, if your document/corpus is specialised, such as for science or economics, the results returned might be unsatisfactory.
231
 
232
- There models available to choose from:""")
233
 
234
  st.markdown(
235
  """Model Source:
236
- Bi-Encoders - [multi-qa-mpnet-base-dot-v1](https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-dot-v1), [all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2), [multi-qa-MiniLM-L6-cos-v1](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1) and [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
237
- Cross-Encoder - [cross-encoder/ms-marco-MiniLM-L-12-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-12-v2)
238
 
239
- Code and App Inspiration Source:
240
- [Sentence Transformers](https://www.sbert.net/examples/applications/retrieve_rerank/README.html)
241
 
242
- Quick summary of the purposes of a Bi and Cross-encoder below, the image and info were adapted from [www.sbert.net](https://www.sbert.net/examples/applications/semantic-search/README.html):
243
 
244
- Bi-Encoder (Retrieval): The Bi-encoder is responsible for independently embedding the sentences and search queries into a vector space. The result is then passed to the cross-encoder for checking the relevance/similarity between the query and sentences.
 
245
 
246
- Cross-Encoder (Re-Ranker): A re-ranker based on a Cross-Encoder can substantially improve the final results for the user. The query and a possible document is passed simultaneously to transformer network, which then outputs a single score between 0 and 1 indicating how relevant the document is for the given query. The cross-encoder further boost the performance, especially when you search over a corpus for which the bi-encoder was not trained for. """
247
- )
 
248
 
249
  st.image('encoder.png', caption='Retrieval and Re-Rank')
250
 
@@ -252,7 +253,8 @@ st.markdown("""
252
  In order to use the app:
253
  - Select the preferred Sentence Transformer model (Bi-Encoder).
254
  - Select the number of sentences per paragraph to partition your corpus (Window-Size), if you choose a small value the context from the other sentences might get lost and for larger values the results might take longer to generate.
255
- - Paste the URL with your corpus or upload your preferred document in txt, pdf or Word format
 
256
  - Semantic Search away!! """
257
  )
258
 
@@ -265,11 +267,6 @@ st.markdown(
265
  unsafe_allow_html=True,
266
  )
267
 
268
- st.markdown(
269
- "<h3 style='text-align: center; color: red;'>OR</h3>",
270
- unsafe_allow_html=True,
271
- )
272
-
273
  upload_doc = st.file_uploader(
274
  "Upload a .txt, .pdf, .docx file"
275
  )
 
225
  top_k = st.sidebar.slider("Number of Top Hits Generated",min_value=1,max_value=5,value=2)
226
 
227
  st.markdown(
228
+ """-The app supports asymmetric Semantic search which seeks to improve search accuracy of documents/URL by understanding the content of the search query in contrast to traditional search engines which only find documents based on lexical matches.
229
+ -The idea behind semantic search is to embed all entries in your corpus, whether they be sentences, paragraphs, or documents, into a vector space. At search time, the query is embedded into the same vector space and the closest embeddings from your corpus are found. These entries should have a high semantic overlap with the query.
230
+ -The all-* models where trained on all available training data (more than 1 billion training pairs) and are designed as general purpose models. The all-mpnet-base-v2 model provides the best quality, while all-MiniLM-L6-v2 is 5 times faster and still offers good quality. The models used have been trained on broad datasets, however, if your document/corpus is specialised, such as for science or economics, the results returned might be unsatisfactory.""")
231
 
232
+ st.markdown("""There models available to choose from:""")
233
 
234
  st.markdown(
235
  """Model Source:
236
+ -Bi-Encoders - [multi-qa-mpnet-base-dot-v1](https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-dot-v1), [all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2), [multi-qa-MiniLM-L6-cos-v1](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1) and [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
237
+ -Cross-Encoder - [cross-encoder/ms-marco-MiniLM-L-12-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-12-v2)""")
238
 
239
+ st.markdown(
240
+ """Code and App Inspiration Source: [Sentence Transformers](https://www.sbert.net/examples/applications/retrieve_rerank/README.html)""")
241
 
 
242
 
243
+ st.markdown(
244
+ """Quick summary of the purposes of a Bi and Cross-encoder below, the image and info were adapted from [www.sbert.net](https://www.sbert.net/examples/applications/semantic-search/README.html):""")
245
 
246
+ st.markdown(
247
+ """-Bi-Encoder (Retrieval): The Bi-encoder is responsible for independently embedding the sentences and search queries into a vector space. The result is then passed to the cross-encoder for checking the relevance/similarity between the query and sentences.
248
+ -Cross-Encoder (Re-Ranker): A re-ranker based on a Cross-Encoder can substantially improve the final results for the user. The query and a possible document is passed simultaneously to transformer network, which then outputs a single score between 0 and 1 indicating how relevant the document is for the given query. The cross-encoder further boost the performance, especially when you search over a corpus for which the bi-encoder was not trained for.""")
249
 
250
  st.image('encoder.png', caption='Retrieval and Re-Rank')
251
 
 
253
  In order to use the app:
254
  - Select the preferred Sentence Transformer model (Bi-Encoder).
255
  - Select the number of sentences per paragraph to partition your corpus (Window-Size), if you choose a small value the context from the other sentences might get lost and for larger values the results might take longer to generate.
256
+ - Select the number of top hits to be generated.
257
+ - Paste the URL with your corpus or upload your preferred document in txt, pdf or Word format.
258
  - Semantic Search away!! """
259
  )
260
 
 
267
  unsafe_allow_html=True,
268
  )
269
 
 
 
 
 
 
270
  upload_doc = st.file_uploader(
271
  "Upload a .txt, .pdf, .docx file"
272
  )