Spaces:

borodache
/

hebrew-dentsit

Build error

App Files Files Community

borodache commited on Feb 27

Commit

ae093ba

verified ·

1 Parent(s): a983ce0

Update README.md

Browse files

Files changed (1) hide show

README.md +3 -4

README.md CHANGED Viewed

@@ -14,14 +14,13 @@ Do you want to consult with a Dentist? Speaking Hebrew? Consulting with Dentist
 Warning: The Agent (Chatbot) can still hallucinate and make up "fake" facts and shouldn’t be an alternative for an expert Dentist. the use of this Chatbot is on your responsibility only.
-This RAG Agent based on Q&A data collected from 3 top Israeli forums. Data was collected using scraper, and saved into a SQL DB. Then, the titles & questions were embedded into vectors using free 'MPA/sambert' HuggingFace Encoder Model (this model found to be performing well on Hebrew Medical Jargon). The Vectors were inserted one at a time, into NoSQL Pinecone Vector Database, with answers as metadata.
 Now, all is left is the the RAG Agent which is composed from a Retriever, Reranker, and a Generator:
 4)	The Retriever embeds the user question (using the free 'MPA/sambert' HuggingFace Encoder Model) uses an ANN search with a cosine similarity metric and the top_k variable equals to 50.
-5)	The Reranker embeds the 50 answers retrieved (using the free 'MPA/sambert' HuggingFace Encoder Model) resorts the answers, selects the top_n variable equal to 5 when each answer should be similar to the question embedding with a threshold of 0.7 or higher.
 6)	The Generator used is from a paid API -Anthropic Claude Sonnet 3.5 - a decoder that is not trained over the medical jargon - however with the right prompt and the right context the results are pretty good.
-Disclaimer: So far, the Agent has only one question at a time capacity, a problem that will be addressed in the future. Stay tuned.
 The whole work from inception to completion was done by me (Eli Borodach)

 Warning: The Agent (Chatbot) can still hallucinate and make up "fake" facts and shouldn’t be an alternative for an expert Dentist. the use of this Chatbot is on your responsibility only.
+This RAG Agent based on Q&A data collected from 3 top Israeli forums. Data was collected using scraper, and saved into a SQL DB. Then, the titles & questions were embedded into vectors using free 'MPA/sambert' HuggingFace Encoder Model (this model found to be performing well on Hebrew Medical Jargon). The Vectors were stored a hundread at a time, into NoSQL Pinecone Vector Database, with answer_id as metadata.
+The answers were converted into vector embedding using the same free Encoder ('MPA/sambert'), and stored in Pinecone with different key and with the answer as metadata
 Now, all is left is the the RAG Agent which is composed from a Retriever, Reranker, and a Generator:
 4)	The Retriever embeds the user question (using the free 'MPA/sambert' HuggingFace Encoder Model) uses an ANN search with a cosine similarity metric and the top_k variable equals to 50.
+5)	The Reranker fetches the answers vectors suing their list of top_k ids and answers as metadata in a second scan from the PineCone database resorts the answers, then cosine similarity is calculated using the sklearn method. Afterwards, it selects the the top_n (equal to 5) answers, when each answer should be similar to the question embedding with a threshold of 0.7 or higher.
 6)	The Generator used is from a paid API -Anthropic Claude Sonnet 3.5 - a decoder that is not trained over the medical jargon - however with the right prompt and the right context the results are pretty good.
 The whole work from inception to completion was done by me (Eli Borodach)