Luca Foppiano commited on
Commit
727fc17
·
unverified ·
2 Parent(s): 53c8deb b5dfde0

Merge branch 'main' into question-coefficient

Browse files
Files changed (2) hide show
  1. README.md +1 -1
  2. streamlit_app.py +1 -0
README.md CHANGED
@@ -22,7 +22,7 @@ https://lfoppiano-document-qa.hf.space/
22
 
23
  Question/Answering on scientific documents using LLMs: ChatGPT-3.5-turbo, GPT4, GPT4-Turbo, Mistral-7b-instruct and Zephyr-7b-beta.
24
  The streamlit application demonstrates the implementation of a RAG (Retrieval Augmented Generation) on scientific documents, that we are developing at NIMS (National Institute for Materials Science), in Tsukuba, Japan.
25
- Different to most of the projects, we focus on scientific articles.
26
  We target only the full-text using [Grobid](https://github.com/kermitt2/grobid) which provides cleaner results than the raw PDF2Text converter (which is comparable with most of other solutions).
27
 
28
  Additionally, this frontend provides the visualisation of named entities on LLM responses to extract <span stype="color:yellow">physical quantities, measurements</span> (with [grobid-quantities](https://github.com/kermitt2/grobid-quantities)) and <span stype="color:blue">materials</span> mentions (with [grobid-superconductors](https://github.com/lfoppiano/grobid-superconductors)).
 
22
 
23
  Question/Answering on scientific documents using LLMs: ChatGPT-3.5-turbo, GPT4, GPT4-Turbo, Mistral-7b-instruct and Zephyr-7b-beta.
24
  The streamlit application demonstrates the implementation of a RAG (Retrieval Augmented Generation) on scientific documents, that we are developing at NIMS (National Institute for Materials Science), in Tsukuba, Japan.
25
+ **Different to most of the projects**, we focus on scientific articles and we extract text from a structured document.
26
  We target only the full-text using [Grobid](https://github.com/kermitt2/grobid) which provides cleaner results than the raw PDF2Text converter (which is comparable with most of other solutions).
27
 
28
  Additionally, this frontend provides the visualisation of named entities on LLM responses to extract <span stype="color:yellow">physical quantities, measurements</span> (with [grobid-quantities](https://github.com/kermitt2/grobid-quantities)) and <span stype="color:blue">materials</span> mentions (with [grobid-superconductors](https://github.com/lfoppiano/grobid-superconductors)).
streamlit_app.py CHANGED
@@ -554,4 +554,5 @@ with left_column:
554
  annotation_outline_size=1,
555
  annotations=st.session_state['annotations'],
556
  rendering=st.session_state['pdf_rendering']
 
557
  )
 
554
  annotation_outline_size=1,
555
  annotations=st.session_state['annotations'],
556
  rendering=st.session_state['pdf_rendering']
557
+ render_text=True
558
  )