Spaces:
Sleeping
Sleeping
Michela
commited on
Commit
·
ff8bdc8
1
Parent(s):
559c653
Update app.py
Browse files
app.py
CHANGED
@@ -162,8 +162,10 @@ with gr.Blocks() as demo:
|
|
162 |
gr.Markdown("""
|
163 |
## 🔍 Preview Text Retrieval Results with Marqo Vector Database
|
164 |
<div style="font-size: 18px;">
|
165 |
-
<p><b>Instructions:</b> Browse through the retrieval results for the text prompt <i>"Pferd, Pferde"</i> by sliding the page slider (up to 100 first retrieval results can be inspected).
|
166 |
-
|
|
|
|
|
167 |
To inspect the page in the full book, click on <i>Open ONB Viewer</i> in the document details below.</p>
|
168 |
</div>""")
|
169 |
|
@@ -214,7 +216,10 @@ with gr.Blocks() as demo:
|
|
214 |
This research was done in the <a href="https://onit.oeaw.ac.at/">Ottoman Nature in Travelogues (ONiT)</a> project and funded by the Austrian Science Fund (FWF: P 35245).
|
215 |
The text retrieval was done with hybrid vector/lexical search (BM25) by using a <a href="https://docs.marqo.ai/">Marqo</a>
|
216 |
vector index. The texts were indexed as one page per document unit, and by splitting them in 2-sentence vectors and embedding them with
|
217 |
-
<a href="https://huggingface.co/flax-sentence-embeddings/all_datasets_v4_mpnet-base">flax-sentence-embeddings/all_datasets_v4_mpnet-base</a> model
|
|
|
|
|
|
|
218 |
<p>For more information, contact <a href="mailto:[email protected]">michela(dot)vignoli(at)ait(dot)ac(dot)at</a>.</p>
|
219 |
</div>
|
220 |
""")
|
|
|
162 |
gr.Markdown("""
|
163 |
## 🔍 Preview Text Retrieval Results with Marqo Vector Database
|
164 |
<div style="font-size: 18px;">
|
165 |
+
<p><b>Instructions:</b> Browse through the retrieval results for the text prompt <i>"Pferd, Pferde"</i> by sliding the page slider (up to 100 first retrieval results can be inspected).
|
166 |
+
Select the data source: Choose between <i>Results Cleaned OCR, Results LLM Preprocessed OCR, and Results Original OCR</i>.
|
167 |
+
To visualise details about the retrieved text chunk, copy and paste the document name (e.g. <i>Z166069305_430</i>) in the search bar below and click on the <i>Inspect</i> button.
|
168 |
+
Please note that pressing <i>Enter</i> does not work.
|
169 |
To inspect the page in the full book, click on <i>Open ONB Viewer</i> in the document details below.</p>
|
170 |
</div>""")
|
171 |
|
|
|
216 |
This research was done in the <a href="https://onit.oeaw.ac.at/">Ottoman Nature in Travelogues (ONiT)</a> project and funded by the Austrian Science Fund (FWF: P 35245).
|
217 |
The text retrieval was done with hybrid vector/lexical search (BM25) by using a <a href="https://docs.marqo.ai/">Marqo</a>
|
218 |
vector index. The texts were indexed as one page per document unit, and by splitting them in 2-sentence vectors and embedding them with
|
219 |
+
<a href="https://huggingface.co/flax-sentence-embeddings/all_datasets_v4_mpnet-base">flax-sentence-embeddings/all_datasets_v4_mpnet-base</a> model.
|
220 |
+
<i>Results Cleaned OCR</i> contain the retrieval results for the vectorized OCR texts that were cleaned by using regular expressions.
|
221 |
+
<i>Results LLM Preprocessed OCR</i> contain the retrieval results for the vectorized OCR texts that were automatically corrected with Llama3.1:70b.
|
222 |
+
<i>Results Original OCR</i> contain the retrieval results for the original OCR texts (without any preprocessing).</p>
|
223 |
<p>For more information, contact <a href="mailto:[email protected]">michela(dot)vignoli(at)ait(dot)ac(dot)at</a>.</p>
|
224 |
</div>
|
225 |
""")
|