README / README.md
tillwenke's picture
Update README.md
8d8e735
|
raw
history blame
1.19 kB
metadata
title: README
emoji: 🐨
colorFrom: purple
colorTo: indigo
sdk: static
pinned: false

To test your RAG solution it would be powerful to have access to a dataset that consists of a text corpus, correct responses to queries (e.g. question-answer) to test the solution end-to-end and maybe even a set of relevant passages from the text corpus for each query to test the retrieval component separately as well. We call this a question-answer-passages dataset.

There are plenty of large-scale datasets of this kind such as Google's Natural Questions.

Still we lack such datasets that are small-scale and narrow-domain to just test our RAG solution quickly or to see how it performs in a certain domain context.

We created this space to create a collections of such datasets to boost the developement of RAG solutions.

Datasets consist of:

  • A text corpus already split into passages, referencing passages by id.
  • A dataset for testing consistig of:
    • A question, and one or ideally both of the followin.
    • A correct short answer.
    • A list of the passage ids that are relevant to answer the question.