File size: 4,741 Bytes
cfd3735
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "1edb9e6b",
   "metadata": {},
   "source": [
    "# ChatGPT Plugin Retriever\n",
    "\n",
    "This notebook shows how to use the ChatGPT Retriever Plugin within LangChain."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "074b0004",
   "metadata": {},
   "source": [
    "## Create\n",
    "\n",
    "First, let's go over how to create the ChatGPT Retriever Plugin.\n",
    "\n",
    "To set up the ChatGPT Retriever Plugin, please follow instructions [here](https://github.com/openai/chatgpt-retrieval-plugin).\n",
    "\n",
    "You can also create the ChatGPT Retriever Plugin from LangChain document loaders. The below code walks through how to do that."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "bbe89ca0",
   "metadata": {},
   "outputs": [],
   "source": [
    "# STEP 1: Load\n",
    "\n",
    "# Load documents using LangChain's DocumentLoaders\n",
    "# This is from https://langchain.readthedocs.io/en/latest/modules/document_loaders/examples/csv.html\n",
    "\n",
    "from langchain.document_loaders.csv_loader import CSVLoader\n",
    "loader = CSVLoader(file_path='../../document_loaders/examples/example_data/mlb_teams_2012.csv')\n",
    "data = loader.load()\n",
    "\n",
    "\n",
    "# STEP 2: Convert\n",
    "\n",
    "# Convert Document to format expected by https://github.com/openai/chatgpt-retrieval-plugin\n",
    "from typing import List\n",
    "from langchain.docstore.document import Document\n",
    "import json\n",
    "\n",
    "def write_json(path: str, documents: List[Document])-> None:\n",
    "    results = [{\"text\": doc.page_content} for doc in documents]\n",
    "    with open(path, \"w\") as f:\n",
    "        json.dump(results, f, indent=2)\n",
    "\n",
    "write_json(\"foo.json\", data)\n",
    "\n",
    "# STEP 3: Use\n",
    "\n",
    "# Ingest this as you would any other json file in https://github.com/openai/chatgpt-retrieval-plugin/tree/main/scripts/process_json\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0474661d",
   "metadata": {},
   "source": [
    "## Using the ChatGPT Retriever Plugin\n",
    "\n",
    "Okay, so we've created the ChatGPT Retriever Plugin, but how do we actually use it?\n",
    "\n",
    "The below code walks through how to do that."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "39d6074e",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.retrievers import ChatGPTPluginRetriever"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "33fd23d1",
   "metadata": {},
   "outputs": [],
   "source": [
    "retriever = ChatGPTPluginRetriever(url=\"http://0.0.0.0:8000\", bearer_token=\"foo\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "16250bdf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[Document(page_content=\"This is Alice's phone number: 123-456-7890\", lookup_str='', metadata={'id': '456_0', 'metadata': {'source': 'email', 'source_id': '567', 'url': None, 'created_at': '1609592400.0', 'author': 'Alice', 'document_id': '456'}, 'embedding': None, 'score': 0.925571561}, lookup_index=0),\n",
       " Document(page_content='This is a document about something', lookup_str='', metadata={'id': '123_0', 'metadata': {'source': 'file', 'source_id': 'https://example.com/doc1', 'url': 'https://example.com/doc1', 'created_at': '1609502400.0', 'author': 'Alice', 'document_id': '123'}, 'embedding': None, 'score': 0.6987589}, lookup_index=0),\n",
       " Document(page_content='Team: Angels \"Payroll (millions)\": 154.49 \"Wins\": 89', lookup_str='', metadata={'id': '59c2c0c1-ae3f-4272-a1da-f44a723ea631_0', 'metadata': {'source': None, 'source_id': None, 'url': None, 'created_at': None, 'author': None, 'document_id': '59c2c0c1-ae3f-4272-a1da-f44a723ea631'}, 'embedding': None, 'score': 0.697888613}, lookup_index=0)]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "retriever.get_relevant_documents(\"alice's phone number\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c8b5794b",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}