Update app.py
Browse files
app.py
CHANGED
|
@@ -223,26 +223,30 @@ top_k = st.sidebar.slider("Number of Top Hits Generated",min_value=1,max_value=5
|
|
| 223 |
|
| 224 |
st.markdown(
|
| 225 |
"""
|
| 226 |
-
|
| 227 |
-
|
| 228 |
-
|
| 229 |
|
| 230 |
st.markdown("""There models available to choose from:""")
|
| 231 |
|
| 232 |
st.markdown(
|
| 233 |
-
"""
|
| 234 |
-
|
| 235 |
-
|
|
|
|
| 236 |
|
| 237 |
st.markdown(
|
| 238 |
-
"""
|
|
|
|
| 239 |
|
| 240 |
st.markdown(
|
| 241 |
-
"""
|
|
|
|
| 242 |
|
| 243 |
st.markdown(
|
| 244 |
-
"""
|
| 245 |
-
|
|
|
|
| 246 |
|
| 247 |
st.image(Image.open('encoder.png'), caption='Retrieval and Re-Rank')
|
| 248 |
|
|
|
|
| 223 |
|
| 224 |
st.markdown(
|
| 225 |
"""
|
| 226 |
+
- The app supports asymmetric Semantic search which seeks to improve search accuracy of documents/URL by understanding the content of the search query in contrast to traditional search engines which only find documents based on lexical matches.
|
| 227 |
+
- The idea behind semantic search is to embed all entries in your corpus, whether they be sentences, paragraphs, or documents, into a vector space. At search time, the query is embedded into the same vector space and the closest embeddings from your corpus are found. These entries should have a high semantic overlap with the query.
|
| 228 |
+
- The all-* models where trained on all available training data (more than 1 billion training pairs) and are designed as general purpose models. The all-mpnet-base-v2 model provides the best quality, while all-MiniLM-L6-v2 is 5 times faster and still offers good quality. The models used have been trained on broad datasets, however, if your document/corpus is specialised, such as for science or economics, the results returned might be unsatisfactory.""")
|
| 229 |
|
| 230 |
st.markdown("""There models available to choose from:""")
|
| 231 |
|
| 232 |
st.markdown(
|
| 233 |
+
"""
|
| 234 |
+
Model Source:
|
| 235 |
+
- Bi-Encoders - [multi-qa-mpnet-base-dot-v1](https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-dot-v1), [all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2), [multi-qa-MiniLM-L6-cos-v1](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1) and [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
|
| 236 |
+
- Cross-Encoder - [cross-encoder/ms-marco-MiniLM-L-12-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-12-v2)""")
|
| 237 |
|
| 238 |
st.markdown(
|
| 239 |
+
"""
|
| 240 |
+
Code and App Inspiration Source: [Sentence Transformers](https://www.sbert.net/examples/applications/retrieve_rerank/README.html)""")
|
| 241 |
|
| 242 |
st.markdown(
|
| 243 |
+
"""
|
| 244 |
+
Quick summary of the purposes of a Bi and Cross-encoder below, the image and info were adapted from [www.sbert.net](https://www.sbert.net/examples/applications/semantic-search/README.html):""")
|
| 245 |
|
| 246 |
st.markdown(
|
| 247 |
+
"""
|
| 248 |
+
- Bi-Encoder (Retrieval): The Bi-encoder is responsible for independently embedding the sentences and search queries into a vector space. The result is then passed to the cross-encoder for checking the relevance/similarity between the query and sentences.
|
| 249 |
+
- Cross-Encoder (Re-Ranker): A re-ranker based on a Cross-Encoder can substantially improve the final results for the user. The query and a possible document is passed simultaneously to transformer network, which then outputs a single score between 0 and 1 indicating how relevant the document is for the given query. The cross-encoder further boost the performance, especially when you search over a corpus for which the bi-encoder was not trained for.""")
|
| 250 |
|
| 251 |
st.image(Image.open('encoder.png'), caption='Retrieval and Re-Rank')
|
| 252 |
|