File size: 19,807 Bytes
cfd3735
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "fc0db1bc",
   "metadata": {},
   "source": [
    "# Cohere Reranker\n",
    "\n",
    "This notebook shows how to use [Cohere's rerank endpoint](https://docs.cohere.com/docs/reranking) in a retriever. This builds on top of ideas in the [ContextualCompressionRetriever](contextual-compression.ipynb)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "28e8dc12",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Helper function for printing docs\n",
    "\n",
    "def pretty_print_docs(docs):\n",
    "    print(f\"\\n{'-' * 100}\\n\".join([f\"Document {i+1}:\\n\\n\" + d.page_content for i, d in enumerate(docs)]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6fa3d916",
   "metadata": {},
   "source": [
    "## Set up the base vector store retriever\n",
    "Let's start by initializing a simple vector store retriever and storing the 2023 State of the Union speech (in chunks). We can set up the retriever to retrieve a high number (20) of docs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "9fbcc58f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Document 1:\n",
      "\n",
      "One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
      "\n",
      "And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n",
      "----------------------------------------------------------------------------------------------------\n",
      "Document 2:\n",
      "\n",
      "As I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \n",
      "\n",
      "While it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice.\n",
      "----------------------------------------------------------------------------------------------------\n",
      "Document 3:\n",
      "\n",
      "A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n",
      "\n",
      "And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.\n",
      "----------------------------------------------------------------------------------------------------\n",
      "Document 4:\n",
      "\n",
      "He met the Ukrainian people. \n",
      "\n",
      "From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. \n",
      "\n",
      "Groups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland. \n",
      "\n",
      "In this struggle as President Zelenskyy said in his speech to the European Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United States is here tonight.\n",
      "----------------------------------------------------------------------------------------------------\n",
      "Document 5:\n",
      "\n",
      "I spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. \n",
      "\n",
      "I’ve worked on these issues a long time. \n",
      "\n",
      "I know what works: Investing in crime preventionand community police officers who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and safety. \n",
      "\n",
      "So let’s not abandon our streets. Or choose between safety and equal justice.\n",
      "----------------------------------------------------------------------------------------------------\n",
      "Document 6:\n",
      "\n",
      "Vice President Harris and I ran for office with a new economic vision for America. \n",
      "\n",
      "Invest in America. Educate Americans. Grow the workforce. Build the economy from the bottom up  \n",
      "and the middle out, not from the top down.  \n",
      "\n",
      "Because we know that when the middle class grows, the poor have a ladder up and the wealthy do very well. \n",
      "\n",
      "America used to have the best roads, bridges, and airports on Earth. \n",
      "\n",
      "Now our infrastructure is ranked 13th in the world.\n",
      "----------------------------------------------------------------------------------------------------\n",
      "Document 7:\n",
      "\n",
      "And tonight, I’m announcing that the Justice Department will name a chief prosecutor for pandemic fraud. \n",
      "\n",
      "By the end of this year, the deficit will be down to less than half what it was before I took office.  \n",
      "\n",
      "The only president ever to cut the deficit by more than one trillion dollars in a single year. \n",
      "\n",
      "Lowering your costs also means demanding more competition. \n",
      "\n",
      "I’m a capitalist, but capitalism without competition isn’t capitalism. \n",
      "\n",
      "It’s exploitation—and it drives up prices.\n",
      "----------------------------------------------------------------------------------------------------\n",
      "Document 8:\n",
      "\n",
      "For the past 40 years we were told that if we gave tax breaks to those at the very top, the benefits would trickle down to everyone else. \n",
      "\n",
      "But that trickle-down theory led to weaker economic growth, lower wages, bigger deficits, and the widest gap between those at the top and everyone else in nearly a century. \n",
      "\n",
      "Vice President Harris and I ran for office with a new economic vision for America.\n",
      "----------------------------------------------------------------------------------------------------\n",
      "Document 9:\n",
      "\n",
      "All told, we created 369,000 new manufacturing jobs in America just last year. \n",
      "\n",
      "Powered by people I’ve met like JoJo Burgess, from generations of union steelworkers from Pittsburgh, who’s here with us tonight. \n",
      "\n",
      "As Ohio Senator Sherrod Brown says, “It’s time to bury the label “Rust Belt.” \n",
      "\n",
      "It’s time. \n",
      "\n",
      "But with all the bright spots in our economy, record job growth and higher wages, too many families are struggling to keep up with the bills.\n",
      "----------------------------------------------------------------------------------------------------\n",
      "Document 10:\n",
      "\n",
      "I’m also calling on Congress: pass a law to make sure veterans devastated by toxic exposures in Iraq and Afghanistan finally get the benefits and comprehensive health care they deserve. \n",
      "\n",
      "And fourth, let’s end cancer as we know it. \n",
      "\n",
      "This is personal to me and Jill, to Kamala, and to so many of you. \n",
      "\n",
      "Cancer is the #2 cause of death in America–second only to heart disease.\n",
      "----------------------------------------------------------------------------------------------------\n",
      "Document 11:\n",
      "\n",
      "He will never extinguish their love of freedom. He will never weaken the resolve of the free world. \n",
      "\n",
      "We meet tonight in an America that has lived through two of the hardest years this nation has ever faced. \n",
      "\n",
      "The pandemic has been punishing. \n",
      "\n",
      "And so many families are living paycheck to paycheck, struggling to keep up with the rising cost of food, gas, housing, and so much more. \n",
      "\n",
      "I understand.\n",
      "----------------------------------------------------------------------------------------------------\n",
      "Document 12:\n",
      "\n",
      "Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n",
      "\n",
      "Last year COVID-19 kept us apart. This year we are finally together again. \n",
      "\n",
      "Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n",
      "\n",
      "With a duty to one another to the American people to the Constitution. \n",
      "\n",
      "And with an unwavering resolve that freedom will always triumph over tyranny.\n",
      "----------------------------------------------------------------------------------------------------\n",
      "Document 13:\n",
      "\n",
      "I know. \n",
      "\n",
      "One of those soldiers was my son Major Beau Biden. \n",
      "\n",
      "We don’t know for sure if a burn pit was the cause of his brain cancer, or the diseases of so many of our troops. \n",
      "\n",
      "But I’m committed to finding out everything we can. \n",
      "\n",
      "Committed to military families like Danielle Robinson from Ohio. \n",
      "\n",
      "The widow of Sergeant First Class Heath Robinson.  \n",
      "\n",
      "He was born a soldier. Army National Guard. Combat medic in Kosovo and Iraq.\n",
      "----------------------------------------------------------------------------------------------------\n",
      "Document 14:\n",
      "\n",
      "And soon, we’ll strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. \n",
      "\n",
      "So tonight I’m offering a Unity Agenda for the Nation. Four big things we can do together.  \n",
      "\n",
      "First, beat the opioid epidemic. \n",
      "\n",
      "There is so much we can do. Increase funding for prevention, treatment, harm reduction, and recovery.\n",
      "----------------------------------------------------------------------------------------------------\n",
      "Document 15:\n",
      "\n",
      "Third, support our veterans. \n",
      "\n",
      "Veterans are the best of us. \n",
      "\n",
      "I’ve always believed that we have a sacred obligation to equip all those we send to war and care for them and their families when they come home. \n",
      "\n",
      "My administration is providing assistance with job training and housing, and now helping lower-income veterans get VA care debt-free.  \n",
      "\n",
      "Our troops in Iraq and Afghanistan faced many dangers.\n",
      "----------------------------------------------------------------------------------------------------\n",
      "Document 16:\n",
      "\n",
      "When we invest in our workers, when we build the economy from the bottom up and the middle out together, we can do something we haven’t done in a long time: build a better America. \n",
      "\n",
      "For more than two years, COVID-19 has impacted every decision in our lives and the life of the nation. \n",
      "\n",
      "And I know you’re tired, frustrated, and exhausted. \n",
      "\n",
      "But I also know this.\n",
      "----------------------------------------------------------------------------------------------------\n",
      "Document 17:\n",
      "\n",
      "Now is the hour. \n",
      "\n",
      "Our moment of responsibility. \n",
      "\n",
      "Our test of resolve and conscience, of history itself. \n",
      "\n",
      "It is in this moment that our character is formed. Our purpose is found. Our future is forged. \n",
      "\n",
      "Well I know this nation.  \n",
      "\n",
      "We will meet the test. \n",
      "\n",
      "To protect freedom and liberty, to expand fairness and opportunity. \n",
      "\n",
      "We will save democracy. \n",
      "\n",
      "As hard as these times have been, I am more optimistic about America today than I have been my whole life.\n",
      "----------------------------------------------------------------------------------------------------\n",
      "Document 18:\n",
      "\n",
      "He didn’t know how to stop fighting, and neither did she. \n",
      "\n",
      "Through her pain she found purpose to demand we do better. \n",
      "\n",
      "Tonight, Danielle—we are. \n",
      "\n",
      "The VA is pioneering new ways of linking toxic exposures to diseases, already helping more veterans get benefits. \n",
      "\n",
      "And tonight, I’m announcing we’re expanding eligibility to veterans suffering from nine respiratory cancers.\n",
      "----------------------------------------------------------------------------------------------------\n",
      "Document 19:\n",
      "\n",
      "I understand. \n",
      "\n",
      "I remember when my Dad had to leave our home in Scranton, Pennsylvania to find work. I grew up in a family where if the price of food went up, you felt it. \n",
      "\n",
      "That’s why one of the first things I did as President was fight to pass the American Rescue Plan.  \n",
      "\n",
      "Because people were hurting. We needed to act, and we did. \n",
      "\n",
      "Few pieces of legislation have done more in a critical moment in our history to lift us out of crisis.\n",
      "----------------------------------------------------------------------------------------------------\n",
      "Document 20:\n",
      "\n",
      "So let’s not abandon our streets. Or choose between safety and equal justice. \n",
      "\n",
      "Let’s come together to protect our communities, restore trust, and hold law enforcement accountable. \n",
      "\n",
      "That’s why the Justice Department required body cameras, banned chokeholds, and restricted no-knock warrants for its officers.\n"
     ]
    }
   ],
   "source": [
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
    "from langchain.embeddings import OpenAIEmbeddings\n",
    "from langchain.document_loaders import TextLoader\n",
    "from langchain.vectorstores import FAISS\n",
    "\n",
    "documents = TextLoader('../../../state_of_the_union.txt').load()\n",
    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)\n",
    "texts = text_splitter.split_documents(documents)\n",
    "retriever = FAISS.from_documents(texts, OpenAIEmbeddings()).as_retriever(search_kwargs={\"k\": 20})\n",
    "\n",
    "query = \"What did the president say about Ketanji Brown Jackson\"\n",
    "docs = retriever.get_relevant_documents(query)\n",
    "pretty_print_docs(docs)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b7648612",
   "metadata": {},
   "source": [
    "## Doing reranking with CohereRerank\n",
    "Now let's wrap our base retriever with a `ContextualCompressionRetriever`. We'll add an `CohereRerank`, uses the Cohere rerank endpoint to rerank the returned results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "9a658023",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Document 1:\n",
      "\n",
      "One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
      "\n",
      "And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n",
      "----------------------------------------------------------------------------------------------------\n",
      "Document 2:\n",
      "\n",
      "I spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. \n",
      "\n",
      "I’ve worked on these issues a long time. \n",
      "\n",
      "I know what works: Investing in crime preventionand community police officers who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and safety. \n",
      "\n",
      "So let’s not abandon our streets. Or choose between safety and equal justice.\n",
      "----------------------------------------------------------------------------------------------------\n",
      "Document 3:\n",
      "\n",
      "A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n",
      "\n",
      "And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.\n"
     ]
    }
   ],
   "source": [
    "from langchain.llms import OpenAI\n",
    "from langchain.retrievers import ContextualCompressionRetriever\n",
    "from langchain.retrievers.document_compressors import CohereRerank\n",
    "\n",
    "llm = OpenAI(temperature=0)\n",
    "compressor = CohereRerank()\n",
    "compression_retriever = ContextualCompressionRetriever(base_compressor=compressor, base_retriever=retriever)\n",
    "\n",
    "compressed_docs = compression_retriever.get_relevant_documents(\"What did the president say about Ketanji Jackson Brown\")\n",
    "pretty_print_docs(compressed_docs)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b83dfedb",
   "metadata": {},
   "source": [
    "You can of course use this retriever within a QA pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "367dafe0",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.chains import RetrievalQA"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "ae697ca4",
   "metadata": {},
   "outputs": [],
   "source": [
    "chain = RetrievalQA.from_chain_type(llm=OpenAI(temperature=0), retriever=compression_retriever)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "46ee62fc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'query': 'What did the president say about Ketanji Brown Jackson',\n",
       " 'result': \" The president said that Ketanji Brown Jackson is one of the nation's top legal minds and that she is a consensus builder who has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\"}"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain({\"query\": query})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "700a8133",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}