{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Ragas demonstration\n", "\n", "This notebook demonstrates how to evaluate a RAG system using the [Ragas evaluation framework](https://github.com/explodinggradients/ragas?tab=readme-ov-file) and import the resulting evaluation results into InspectorRAGet for analysis.\n", "\n", "### Installation" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collecting git+https://github.com/explodinggradients/ragas\n", " Cloning https://github.com/explodinggradients/ragas to /private/var/folders/l5/fj0t2qmn44x0r042xw6cjr8w0000gn/T/pip-req-build-k70llt1h\n", " Running command git clone --filter=blob:none --quiet https://github.com/explodinggradients/ragas /private/var/folders/l5/fj0t2qmn44x0r042xw6cjr8w0000gn/T/pip-req-build-k70llt1h\n", " Resolved https://github.com/explodinggradients/ragas to commit d2486f117fd827dcfc3e196d4cf7798573c55b09\n", " Installing build dependencies ... \u001b[?25ldone\n", "\u001b[?25h Getting requirements to build wheel ... \u001b[?25ldone\n", "\u001b[?25h Preparing metadata (pyproject.toml) ... \u001b[?25ldone\n", "\u001b[?25hRequirement already satisfied: numpy in ./.venv/lib/python3.9/site-packages (from ragas==0.1.13) (1.26.4)\n", "Requirement already satisfied: datasets in ./.venv/lib/python3.9/site-packages (from ragas==0.1.13) (2.20.0)\n", "Requirement already satisfied: tiktoken in ./.venv/lib/python3.9/site-packages (from ragas==0.1.13) (0.7.0)\n", "Requirement already satisfied: langchain in ./.venv/lib/python3.9/site-packages (from ragas==0.1.13) (0.2.12)\n", "Requirement already satisfied: langchain-core in ./.venv/lib/python3.9/site-packages (from ragas==0.1.13) (0.2.28)\n", "Requirement already satisfied: langchain-community in ./.venv/lib/python3.9/site-packages (from ragas==0.1.13) (0.2.11)\n", "Requirement already satisfied: langchain-openai in ./.venv/lib/python3.9/site-packages (from ragas==0.1.13) (0.1.20)\n", "Requirement already satisfied: openai>1 in ./.venv/lib/python3.9/site-packages (from ragas==0.1.13) (1.39.0)\n", "Requirement already satisfied: pysbd>=0.3.4 in ./.venv/lib/python3.9/site-packages (from ragas==0.1.13) (0.3.4)\n", "Requirement already satisfied: nest-asyncio in ./.venv/lib/python3.9/site-packages (from ragas==0.1.13) (1.6.0)\n", "Requirement already satisfied: appdirs in ./.venv/lib/python3.9/site-packages (from ragas==0.1.13) (1.4.4)\n", "Requirement already satisfied: anyio<5,>=3.5.0 in ./.venv/lib/python3.9/site-packages (from openai>1->ragas==0.1.13) (4.4.0)\n", "Requirement already satisfied: distro<2,>=1.7.0 in ./.venv/lib/python3.9/site-packages (from openai>1->ragas==0.1.13) (1.9.0)\n", "Requirement already satisfied: httpx<1,>=0.23.0 in ./.venv/lib/python3.9/site-packages (from openai>1->ragas==0.1.13) (0.27.0)\n", "Requirement already satisfied: pydantic<3,>=1.9.0 in ./.venv/lib/python3.9/site-packages (from openai>1->ragas==0.1.13) (2.8.2)\n", "Requirement already satisfied: sniffio in ./.venv/lib/python3.9/site-packages (from openai>1->ragas==0.1.13) (1.3.1)\n", "Requirement already satisfied: tqdm>4 in ./.venv/lib/python3.9/site-packages (from openai>1->ragas==0.1.13) (4.66.5)\n", "Requirement already satisfied: typing-extensions<5,>=4.7 in ./.venv/lib/python3.9/site-packages (from openai>1->ragas==0.1.13) (4.12.2)\n", "Requirement already satisfied: filelock in ./.venv/lib/python3.9/site-packages (from datasets->ragas==0.1.13) (3.15.4)\n", "Requirement already satisfied: pyarrow>=15.0.0 in ./.venv/lib/python3.9/site-packages (from datasets->ragas==0.1.13) (17.0.0)\n", "Requirement already satisfied: pyarrow-hotfix in ./.venv/lib/python3.9/site-packages (from datasets->ragas==0.1.13) (0.6)\n", "Requirement already satisfied: dill<0.3.9,>=0.3.0 in ./.venv/lib/python3.9/site-packages (from datasets->ragas==0.1.13) (0.3.8)\n", "Requirement already satisfied: pandas in ./.venv/lib/python3.9/site-packages (from datasets->ragas==0.1.13) (2.2.2)\n", "Requirement already satisfied: requests>=2.32.2 in ./.venv/lib/python3.9/site-packages (from datasets->ragas==0.1.13) (2.32.3)\n", "Requirement already satisfied: xxhash in ./.venv/lib/python3.9/site-packages (from datasets->ragas==0.1.13) (3.4.1)\n", "Requirement already satisfied: multiprocess in ./.venv/lib/python3.9/site-packages (from datasets->ragas==0.1.13) (0.70.16)\n", "Requirement already satisfied: fsspec<=2024.5.0,>=2023.1.0 in ./.venv/lib/python3.9/site-packages (from fsspec[http]<=2024.5.0,>=2023.1.0->datasets->ragas==0.1.13) (2024.5.0)\n", "Requirement already satisfied: aiohttp in ./.venv/lib/python3.9/site-packages (from datasets->ragas==0.1.13) (3.10.1)\n", "Requirement already satisfied: huggingface-hub>=0.21.2 in ./.venv/lib/python3.9/site-packages (from datasets->ragas==0.1.13) (0.24.5)\n", "Requirement already satisfied: packaging in ./.venv/lib/python3.9/site-packages (from datasets->ragas==0.1.13) (24.1)\n", "Requirement already satisfied: pyyaml>=5.1 in ./.venv/lib/python3.9/site-packages (from datasets->ragas==0.1.13) (6.0.1)\n", "Requirement already satisfied: SQLAlchemy<3,>=1.4 in ./.venv/lib/python3.9/site-packages (from langchain->ragas==0.1.13) (2.0.32)\n", "Requirement already satisfied: async-timeout<5.0.0,>=4.0.0 in ./.venv/lib/python3.9/site-packages (from langchain->ragas==0.1.13) (4.0.3)\n", "Requirement already satisfied: langchain-text-splitters<0.3.0,>=0.2.0 in ./.venv/lib/python3.9/site-packages (from langchain->ragas==0.1.13) (0.2.2)\n", "Requirement already satisfied: langsmith<0.2.0,>=0.1.17 in ./.venv/lib/python3.9/site-packages (from langchain->ragas==0.1.13) (0.1.96)\n", "Requirement already satisfied: tenacity!=8.4.0,<9.0.0,>=8.1.0 in ./.venv/lib/python3.9/site-packages (from langchain->ragas==0.1.13) (8.5.0)\n", "Requirement already satisfied: jsonpatch<2.0,>=1.33 in ./.venv/lib/python3.9/site-packages (from langchain-core->ragas==0.1.13) (1.33)\n", "Requirement already satisfied: dataclasses-json<0.7,>=0.5.7 in ./.venv/lib/python3.9/site-packages (from langchain-community->ragas==0.1.13) (0.6.7)\n", "Requirement already satisfied: regex>=2022.1.18 in ./.venv/lib/python3.9/site-packages (from tiktoken->ragas==0.1.13) (2024.7.24)\n", "Requirement already satisfied: aiohappyeyeballs>=2.3.0 in ./.venv/lib/python3.9/site-packages (from aiohttp->datasets->ragas==0.1.13) (2.3.4)\n", "Requirement already satisfied: aiosignal>=1.1.2 in ./.venv/lib/python3.9/site-packages (from aiohttp->datasets->ragas==0.1.13) (1.3.1)\n", "Requirement already satisfied: attrs>=17.3.0 in ./.venv/lib/python3.9/site-packages (from aiohttp->datasets->ragas==0.1.13) (24.1.0)\n", "Requirement already satisfied: frozenlist>=1.1.1 in ./.venv/lib/python3.9/site-packages (from aiohttp->datasets->ragas==0.1.13) (1.4.1)\n", "Requirement already satisfied: multidict<7.0,>=4.5 in ./.venv/lib/python3.9/site-packages (from aiohttp->datasets->ragas==0.1.13) (6.0.5)\n", "Requirement already satisfied: yarl<2.0,>=1.0 in ./.venv/lib/python3.9/site-packages (from aiohttp->datasets->ragas==0.1.13) (1.9.4)\n", "Requirement already satisfied: idna>=2.8 in ./.venv/lib/python3.9/site-packages (from anyio<5,>=3.5.0->openai>1->ragas==0.1.13) (3.7)\n", "Requirement already satisfied: exceptiongroup>=1.0.2 in ./.venv/lib/python3.9/site-packages (from anyio<5,>=3.5.0->openai>1->ragas==0.1.13) (1.2.2)\n", "Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in ./.venv/lib/python3.9/site-packages (from dataclasses-json<0.7,>=0.5.7->langchain-community->ragas==0.1.13) (3.21.3)\n", "Requirement already satisfied: typing-inspect<1,>=0.4.0 in ./.venv/lib/python3.9/site-packages (from dataclasses-json<0.7,>=0.5.7->langchain-community->ragas==0.1.13) (0.9.0)\n", "Requirement already satisfied: certifi in ./.venv/lib/python3.9/site-packages (from httpx<1,>=0.23.0->openai>1->ragas==0.1.13) (2024.7.4)\n", "Requirement already satisfied: httpcore==1.* in ./.venv/lib/python3.9/site-packages (from httpx<1,>=0.23.0->openai>1->ragas==0.1.13) (1.0.5)\n", "Requirement already satisfied: h11<0.15,>=0.13 in ./.venv/lib/python3.9/site-packages (from httpcore==1.*->httpx<1,>=0.23.0->openai>1->ragas==0.1.13) (0.14.0)\n", "Requirement already satisfied: jsonpointer>=1.9 in ./.venv/lib/python3.9/site-packages (from jsonpatch<2.0,>=1.33->langchain-core->ragas==0.1.13) (3.0.0)\n", "Requirement already satisfied: orjson<4.0.0,>=3.9.14 in ./.venv/lib/python3.9/site-packages (from langsmith<0.2.0,>=0.1.17->langchain->ragas==0.1.13) (3.10.6)\n", "Requirement already satisfied: annotated-types>=0.4.0 in ./.venv/lib/python3.9/site-packages (from pydantic<3,>=1.9.0->openai>1->ragas==0.1.13) (0.7.0)\n", "Requirement already satisfied: pydantic-core==2.20.1 in ./.venv/lib/python3.9/site-packages (from pydantic<3,>=1.9.0->openai>1->ragas==0.1.13) (2.20.1)\n", "Requirement already satisfied: charset-normalizer<4,>=2 in ./.venv/lib/python3.9/site-packages (from requests>=2.32.2->datasets->ragas==0.1.13) (3.3.2)\n", "Requirement already satisfied: urllib3<3,>=1.21.1 in ./.venv/lib/python3.9/site-packages (from requests>=2.32.2->datasets->ragas==0.1.13) (2.2.2)\n", "Requirement already satisfied: greenlet!=0.4.17 in ./.venv/lib/python3.9/site-packages (from SQLAlchemy<3,>=1.4->langchain->ragas==0.1.13) (3.0.3)\n", "Requirement already satisfied: python-dateutil>=2.8.2 in ./.venv/lib/python3.9/site-packages (from pandas->datasets->ragas==0.1.13) (2.9.0.post0)\n", "Requirement already satisfied: pytz>=2020.1 in ./.venv/lib/python3.9/site-packages (from pandas->datasets->ragas==0.1.13) (2024.1)\n", "Requirement already satisfied: tzdata>=2022.7 in ./.venv/lib/python3.9/site-packages (from pandas->datasets->ragas==0.1.13) (2024.1)\n", "Requirement already satisfied: six>=1.5 in ./.venv/lib/python3.9/site-packages (from python-dateutil>=2.8.2->pandas->datasets->ragas==0.1.13) (1.16.0)\n", "Requirement already satisfied: mypy-extensions>=0.3.0 in ./.venv/lib/python3.9/site-packages (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain-community->ragas==0.1.13) (1.0.0)\n" ] } ], "source": [ "!pip install git+https://github.com/explodinggradients/ragas" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: ipywidgets in ./.venv/lib/python3.9/site-packages (8.1.3)\n", "Requirement already satisfied: comm>=0.1.3 in ./.venv/lib/python3.9/site-packages (from ipywidgets) (0.2.2)\n", "Requirement already satisfied: ipython>=6.1.0 in ./.venv/lib/python3.9/site-packages (from ipywidgets) (8.18.1)\n", "Requirement already satisfied: traitlets>=4.3.1 in ./.venv/lib/python3.9/site-packages (from ipywidgets) (5.14.3)\n", "Requirement already satisfied: widgetsnbextension~=4.0.11 in ./.venv/lib/python3.9/site-packages (from ipywidgets) (4.0.11)\n", "Requirement already satisfied: jupyterlab-widgets~=3.0.11 in ./.venv/lib/python3.9/site-packages (from ipywidgets) (3.0.11)\n", "Requirement already satisfied: decorator in ./.venv/lib/python3.9/site-packages (from ipython>=6.1.0->ipywidgets) (5.1.1)\n", "Requirement already satisfied: jedi>=0.16 in ./.venv/lib/python3.9/site-packages (from ipython>=6.1.0->ipywidgets) (0.19.1)\n", "Requirement already satisfied: matplotlib-inline in ./.venv/lib/python3.9/site-packages (from ipython>=6.1.0->ipywidgets) (0.1.7)\n", "Requirement already satisfied: prompt-toolkit<3.1.0,>=3.0.41 in ./.venv/lib/python3.9/site-packages (from ipython>=6.1.0->ipywidgets) (3.0.47)\n", "Requirement already satisfied: pygments>=2.4.0 in ./.venv/lib/python3.9/site-packages (from ipython>=6.1.0->ipywidgets) (2.18.0)\n", "Requirement already satisfied: stack-data in ./.venv/lib/python3.9/site-packages (from ipython>=6.1.0->ipywidgets) (0.6.3)\n", "Requirement already satisfied: typing-extensions in ./.venv/lib/python3.9/site-packages (from ipython>=6.1.0->ipywidgets) (4.12.2)\n", "Requirement already satisfied: exceptiongroup in ./.venv/lib/python3.9/site-packages (from ipython>=6.1.0->ipywidgets) (1.2.2)\n", "Requirement already satisfied: pexpect>4.3 in ./.venv/lib/python3.9/site-packages (from ipython>=6.1.0->ipywidgets) (4.9.0)\n", "Requirement already satisfied: parso<0.9.0,>=0.8.3 in ./.venv/lib/python3.9/site-packages (from jedi>=0.16->ipython>=6.1.0->ipywidgets) (0.8.4)\n", "Requirement already satisfied: ptyprocess>=0.5 in ./.venv/lib/python3.9/site-packages (from pexpect>4.3->ipython>=6.1.0->ipywidgets) (0.7.0)\n", "Requirement already satisfied: wcwidth in ./.venv/lib/python3.9/site-packages (from prompt-toolkit<3.1.0,>=3.0.41->ipython>=6.1.0->ipywidgets) (0.2.13)\n", "Requirement already satisfied: executing>=1.2.0 in ./.venv/lib/python3.9/site-packages (from stack-data->ipython>=6.1.0->ipywidgets) (2.0.1)\n", "Requirement already satisfied: asttokens>=2.1.0 in ./.venv/lib/python3.9/site-packages (from stack-data->ipython>=6.1.0->ipywidgets) (2.4.1)\n", "Requirement already satisfied: pure-eval in ./.venv/lib/python3.9/site-packages (from stack-data->ipython>=6.1.0->ipywidgets) (0.2.3)\n", "Requirement already satisfied: six>=1.12.0 in ./.venv/lib/python3.9/site-packages (from asttokens>=2.1.0->stack-data->ipython>=6.1.0->ipywidgets) (1.16.0)\n" ] } ], "source": [ "!pip install ipywidgets" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from datasets import Dataset \n", "import os\n", "from ragas import evaluate\n", "from ragas.metrics import faithfulness, answer_relevancy, context_precision, context_recall" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "os.environ[\"RAGAS_DO_NOT_TRACK\"] = \"true\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import dataset\n", "\n", "For this example, we will be using the sample dataset provided in Ragas' quick start instructions." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "data_samples = {\n", " 'question': ['When was the first super bowl?', 'Who won the most super bowls?'],\n", " 'answer': ['The first superbowl was held on Jan 15, 1967', 'The most super bowls have been won by The New England Patriots'],\n", " 'contexts' : [['The First AFL–NFL World Championship Game was an American football game played on January 15, 1967, at the Los Angeles Memorial Coliseum in Los Angeles,'], \n", " ['The Green Bay Packers...Green Bay, Wisconsin.','The Packers compete...Football Conference']],\n", " 'ground_truth': ['The first superbowl was held on January 15, 1967', 'The New England Patriots have won the Super Bowl a record six times']\n", "}\n", "\n", "dataset = Dataset.from_dict(data_samples)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run evaluation\n", "\n", "In this example, we run evaluation using `gpt-4o-mini` and the following Ragas evaluation metrics: faithfulness, answer relevance, context precision, and context recall. Consult the Ragas documentation on how to use different models or metrics.\n", "\n", "**Note: To use one of the OpenAI model for evaluation, you have to fill in your `OPENAI_API_KEY` below.** " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "os.environ[\"OPENAI_API_KEY\"] = \"provide-your-openai-api-key\"" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "d221210e378a4ad0a25748d8f8f095ba", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Evaluating: 0%| | 0/8 [00:00, ?it/s]" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", " | question | \n", "answer | \n", "contexts | \n", "ground_truth | \n", "faithfulness | \n", "answer_relevancy | \n", "context_precision | \n", "context_recall | \n", "
---|---|---|---|---|---|---|---|---|
0 | \n", "When was the first super bowl? | \n", "The first superbowl was held on Jan 15, 1967 | \n", "[The First AFL–NFL World Championship Game was... | \n", "The first superbowl was held on January 15, 1967 | \n", "1.0 | \n", "0.980714 | \n", "1.0 | \n", "1.0 | \n", "
1 | \n", "Who won the most super bowls? | \n", "The most super bowls have been won by The New ... | \n", "[The Green Bay Packers...Green Bay, Wisconsin.... | \n", "The New England Patriots have won the Super Bo... | \n", "0.0 | \n", "0.943043 | \n", "0.0 | \n", "0.0 | \n", "