Michela commited on
Commit
ff8bdc8
·
1 Parent(s): 559c653

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +8 -3
app.py CHANGED
@@ -162,8 +162,10 @@ with gr.Blocks() as demo:
162
  gr.Markdown("""
163
  ## 🔍 Preview Text Retrieval Results with Marqo Vector Database
164
  <div style="font-size: 18px;">
165
- <p><b>Instructions:</b> Browse through the retrieval results for the text prompt <i>"Pferd, Pferde"</i> by sliding the page slider (up to 100 first retrieval results can be inspected).
166
- To visualise details about the retrieved text chunk, copy and paste the document name (e.g. <i>Z166069305_430</i>) in the search bar below and click on the <i>Inspect</i> button. Please note that pressing <i>Enter</i> does not work.
 
 
167
  To inspect the page in the full book, click on <i>Open ONB Viewer</i> in the document details below.</p>
168
  </div>""")
169
 
@@ -214,7 +216,10 @@ with gr.Blocks() as demo:
214
  This research was done in the <a href="https://onit.oeaw.ac.at/">Ottoman Nature in Travelogues (ONiT)</a> project and funded by the Austrian Science Fund (FWF: P 35245).
215
  The text retrieval was done with hybrid vector/lexical search (BM25) by using a <a href="https://docs.marqo.ai/">Marqo</a>
216
  vector index. The texts were indexed as one page per document unit, and by splitting them in 2-sentence vectors and embedding them with
217
- <a href="https://huggingface.co/flax-sentence-embeddings/all_datasets_v4_mpnet-base">flax-sentence-embeddings/all_datasets_v4_mpnet-base</a> model.</p>
 
 
 
218
  <p>For more information, contact <a href="mailto:[email protected]">michela(dot)vignoli(at)ait(dot)ac(dot)at</a>.</p>
219
  </div>
220
  """)
 
162
  gr.Markdown("""
163
  ## 🔍 Preview Text Retrieval Results with Marqo Vector Database
164
  <div style="font-size: 18px;">
165
+ <p><b>Instructions:</b> Browse through the retrieval results for the text prompt <i>"Pferd, Pferde"</i> by sliding the page slider (up to 100 first retrieval results can be inspected).
166
+ Select the data source: Choose between <i>Results Cleaned OCR, Results LLM Preprocessed OCR, and Results Original OCR</i>.
167
+ To visualise details about the retrieved text chunk, copy and paste the document name (e.g. <i>Z166069305_430</i>) in the search bar below and click on the <i>Inspect</i> button.
168
+ Please note that pressing <i>Enter</i> does not work.
169
  To inspect the page in the full book, click on <i>Open ONB Viewer</i> in the document details below.</p>
170
  </div>""")
171
 
 
216
  This research was done in the <a href="https://onit.oeaw.ac.at/">Ottoman Nature in Travelogues (ONiT)</a> project and funded by the Austrian Science Fund (FWF: P 35245).
217
  The text retrieval was done with hybrid vector/lexical search (BM25) by using a <a href="https://docs.marqo.ai/">Marqo</a>
218
  vector index. The texts were indexed as one page per document unit, and by splitting them in 2-sentence vectors and embedding them with
219
+ <a href="https://huggingface.co/flax-sentence-embeddings/all_datasets_v4_mpnet-base">flax-sentence-embeddings/all_datasets_v4_mpnet-base</a> model.
220
+ <i>Results Cleaned OCR</i> contain the retrieval results for the vectorized OCR texts that were cleaned by using regular expressions.
221
+ <i>Results LLM Preprocessed OCR</i> contain the retrieval results for the vectorized OCR texts that were automatically corrected with Llama3.1:70b.
222
+ <i>Results Original OCR</i> contain the retrieval results for the original OCR texts (without any preprocessing).</p>
223
  <p>For more information, contact <a href="mailto:[email protected]">michela(dot)vignoli(at)ait(dot)ac(dot)at</a>.</p>
224
  </div>
225
  """)