File size: 26,975 Bytes
cfd3735
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Use LangChain, GPT and Deep Lake to work with code base\n",
    "In this tutorial, we are going to use Langchain + Deep Lake with GPT to analyze the code base of the LangChain itself. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Design"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. Prepare data:\n",
    "   1. Upload all python project files using the `langchain.document_loaders.TextLoader`. We will call these files the **documents**.\n",
    "   2. Split all documents to chunks using the `langchain.text_splitter.CharacterTextSplitter`.\n",
    "   3. Embed chunks and upload them into the DeepLake using `langchain.embeddings.openai.OpenAIEmbeddings` and `langchain.vectorstores.DeepLake`\n",
    "2. Question-Answering:\n",
    "   1. Build a chain from `langchain.chat_models.ChatOpenAI` and `langchain.chains.ConversationalRetrievalChain`\n",
    "   2. Prepare questions.\n",
    "   3. Get answers running the chain.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Implementation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "### Integration preparations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We need to set up keys for external services and install necessary python libraries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "#!python3 -m pip install --upgrade langchain deeplake openai"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Set up OpenAI embeddings, Deep Lake multi-modal vector store api and authenticate. \n",
    "\n",
    "For full documentation of Deep Lake please follow https://docs.activeloop.ai/ and API reference https://docs.deeplake.ai/en/latest/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " 路路路路路路路路\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "from getpass import getpass\n",
    "\n",
    "os.environ['OPENAI_API_KEY'] = getpass()\n",
    "# Please manually enter OpenAI Key"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Authenticate into Deep Lake if you want to create your own dataset and publish it. You can get an API key from the platform at [app.activeloop.ai](https://app.activeloop.ai)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " 路路路路路路路路\n"
     ]
    }
   ],
   "source": [
    "os.environ['ACTIVELOOP_TOKEN'] = getpass.getpass('Activeloop Token:')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Prepare data "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Load all repository files. Here we assume this notebook is downloaded as the part of the langchain fork and we work with the python files of the `langchain` repo.\n",
    "\n",
    "If you want to use files from different repo, change `root_dir` to the root dir of your repo."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1147\n"
     ]
    }
   ],
   "source": [
    "from langchain.document_loaders import TextLoader\n",
    "\n",
    "root_dir = '../../../..'\n",
    "\n",
    "docs = []\n",
    "for dirpath, dirnames, filenames in os.walk(root_dir):\n",
    "    for file in filenames:\n",
    "        if file.endswith('.py') and '/.venv/' not in dirpath:\n",
    "            try: \n",
    "                loader = TextLoader(os.path.join(dirpath, file), encoding='utf-8')\n",
    "                docs.extend(loader.load_and_split())\n",
    "            except Exception as e: \n",
    "                pass\n",
    "print(f'{len(docs)}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, chunk the files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Created a chunk of size 1620, which is longer than the specified 1000\n",
      "Created a chunk of size 1213, which is longer than the specified 1000\n",
      "Created a chunk of size 1263, which is longer than the specified 1000\n",
      "Created a chunk of size 1448, which is longer than the specified 1000\n",
      "Created a chunk of size 1120, which is longer than the specified 1000\n",
      "Created a chunk of size 1148, which is longer than the specified 1000\n",
      "Created a chunk of size 1826, which is longer than the specified 1000\n",
      "Created a chunk of size 1260, which is longer than the specified 1000\n",
      "Created a chunk of size 1195, which is longer than the specified 1000\n",
      "Created a chunk of size 2147, which is longer than the specified 1000\n",
      "Created a chunk of size 1410, which is longer than the specified 1000\n",
      "Created a chunk of size 1269, which is longer than the specified 1000\n",
      "Created a chunk of size 1030, which is longer than the specified 1000\n",
      "Created a chunk of size 1046, which is longer than the specified 1000\n",
      "Created a chunk of size 1024, which is longer than the specified 1000\n",
      "Created a chunk of size 1026, which is longer than the specified 1000\n",
      "Created a chunk of size 1285, which is longer than the specified 1000\n",
      "Created a chunk of size 1370, which is longer than the specified 1000\n",
      "Created a chunk of size 1031, which is longer than the specified 1000\n",
      "Created a chunk of size 1999, which is longer than the specified 1000\n",
      "Created a chunk of size 1029, which is longer than the specified 1000\n",
      "Created a chunk of size 1120, which is longer than the specified 1000\n",
      "Created a chunk of size 1033, which is longer than the specified 1000\n",
      "Created a chunk of size 1143, which is longer than the specified 1000\n",
      "Created a chunk of size 1416, which is longer than the specified 1000\n",
      "Created a chunk of size 2482, which is longer than the specified 1000\n",
      "Created a chunk of size 1890, which is longer than the specified 1000\n",
      "Created a chunk of size 1418, which is longer than the specified 1000\n",
      "Created a chunk of size 1848, which is longer than the specified 1000\n",
      "Created a chunk of size 1069, which is longer than the specified 1000\n",
      "Created a chunk of size 2369, which is longer than the specified 1000\n",
      "Created a chunk of size 1045, which is longer than the specified 1000\n",
      "Created a chunk of size 1501, which is longer than the specified 1000\n",
      "Created a chunk of size 1208, which is longer than the specified 1000\n",
      "Created a chunk of size 1950, which is longer than the specified 1000\n",
      "Created a chunk of size 1283, which is longer than the specified 1000\n",
      "Created a chunk of size 1414, which is longer than the specified 1000\n",
      "Created a chunk of size 1304, which is longer than the specified 1000\n",
      "Created a chunk of size 1224, which is longer than the specified 1000\n",
      "Created a chunk of size 1060, which is longer than the specified 1000\n",
      "Created a chunk of size 2461, which is longer than the specified 1000\n",
      "Created a chunk of size 1099, which is longer than the specified 1000\n",
      "Created a chunk of size 1178, which is longer than the specified 1000\n",
      "Created a chunk of size 1449, which is longer than the specified 1000\n",
      "Created a chunk of size 1345, which is longer than the specified 1000\n",
      "Created a chunk of size 3359, which is longer than the specified 1000\n",
      "Created a chunk of size 2248, which is longer than the specified 1000\n",
      "Created a chunk of size 1589, which is longer than the specified 1000\n",
      "Created a chunk of size 2104, which is longer than the specified 1000\n",
      "Created a chunk of size 1505, which is longer than the specified 1000\n",
      "Created a chunk of size 1387, which is longer than the specified 1000\n",
      "Created a chunk of size 1215, which is longer than the specified 1000\n",
      "Created a chunk of size 1240, which is longer than the specified 1000\n",
      "Created a chunk of size 1635, which is longer than the specified 1000\n",
      "Created a chunk of size 1075, which is longer than the specified 1000\n",
      "Created a chunk of size 2180, which is longer than the specified 1000\n",
      "Created a chunk of size 1791, which is longer than the specified 1000\n",
      "Created a chunk of size 1555, which is longer than the specified 1000\n",
      "Created a chunk of size 1082, which is longer than the specified 1000\n",
      "Created a chunk of size 1225, which is longer than the specified 1000\n",
      "Created a chunk of size 1287, which is longer than the specified 1000\n",
      "Created a chunk of size 1085, which is longer than the specified 1000\n",
      "Created a chunk of size 1117, which is longer than the specified 1000\n",
      "Created a chunk of size 1966, which is longer than the specified 1000\n",
      "Created a chunk of size 1150, which is longer than the specified 1000\n",
      "Created a chunk of size 1285, which is longer than the specified 1000\n",
      "Created a chunk of size 1150, which is longer than the specified 1000\n",
      "Created a chunk of size 1585, which is longer than the specified 1000\n",
      "Created a chunk of size 1208, which is longer than the specified 1000\n",
      "Created a chunk of size 1267, which is longer than the specified 1000\n",
      "Created a chunk of size 1542, which is longer than the specified 1000\n",
      "Created a chunk of size 1183, which is longer than the specified 1000\n",
      "Created a chunk of size 2424, which is longer than the specified 1000\n",
      "Created a chunk of size 1017, which is longer than the specified 1000\n",
      "Created a chunk of size 1304, which is longer than the specified 1000\n",
      "Created a chunk of size 1379, which is longer than the specified 1000\n",
      "Created a chunk of size 1324, which is longer than the specified 1000\n",
      "Created a chunk of size 1205, which is longer than the specified 1000\n",
      "Created a chunk of size 1056, which is longer than the specified 1000\n",
      "Created a chunk of size 1195, which is longer than the specified 1000\n",
      "Created a chunk of size 3608, which is longer than the specified 1000\n",
      "Created a chunk of size 1058, which is longer than the specified 1000\n",
      "Created a chunk of size 1075, which is longer than the specified 1000\n",
      "Created a chunk of size 1217, which is longer than the specified 1000\n",
      "Created a chunk of size 1109, which is longer than the specified 1000\n",
      "Created a chunk of size 1440, which is longer than the specified 1000\n",
      "Created a chunk of size 1046, which is longer than the specified 1000\n",
      "Created a chunk of size 1220, which is longer than the specified 1000\n",
      "Created a chunk of size 1403, which is longer than the specified 1000\n",
      "Created a chunk of size 1241, which is longer than the specified 1000\n",
      "Created a chunk of size 1427, which is longer than the specified 1000\n",
      "Created a chunk of size 1049, which is longer than the specified 1000\n",
      "Created a chunk of size 1580, which is longer than the specified 1000\n",
      "Created a chunk of size 1565, which is longer than the specified 1000\n",
      "Created a chunk of size 1131, which is longer than the specified 1000\n",
      "Created a chunk of size 1425, which is longer than the specified 1000\n",
      "Created a chunk of size 1054, which is longer than the specified 1000\n",
      "Created a chunk of size 1027, which is longer than the specified 1000\n",
      "Created a chunk of size 2559, which is longer than the specified 1000\n",
      "Created a chunk of size 1028, which is longer than the specified 1000\n",
      "Created a chunk of size 1382, which is longer than the specified 1000\n",
      "Created a chunk of size 1888, which is longer than the specified 1000\n",
      "Created a chunk of size 1475, which is longer than the specified 1000\n",
      "Created a chunk of size 1652, which is longer than the specified 1000\n",
      "Created a chunk of size 1891, which is longer than the specified 1000\n",
      "Created a chunk of size 1899, which is longer than the specified 1000\n",
      "Created a chunk of size 1021, which is longer than the specified 1000\n",
      "Created a chunk of size 1085, which is longer than the specified 1000\n",
      "Created a chunk of size 1854, which is longer than the specified 1000\n",
      "Created a chunk of size 1672, which is longer than the specified 1000\n",
      "Created a chunk of size 2537, which is longer than the specified 1000\n",
      "Created a chunk of size 1251, which is longer than the specified 1000\n",
      "Created a chunk of size 1734, which is longer than the specified 1000\n",
      "Created a chunk of size 1642, which is longer than the specified 1000\n",
      "Created a chunk of size 1376, which is longer than the specified 1000\n",
      "Created a chunk of size 1253, which is longer than the specified 1000\n",
      "Created a chunk of size 1642, which is longer than the specified 1000\n",
      "Created a chunk of size 1419, which is longer than the specified 1000\n",
      "Created a chunk of size 1438, which is longer than the specified 1000\n",
      "Created a chunk of size 1427, which is longer than the specified 1000\n",
      "Created a chunk of size 1684, which is longer than the specified 1000\n",
      "Created a chunk of size 1760, which is longer than the specified 1000\n",
      "Created a chunk of size 1157, which is longer than the specified 1000\n",
      "Created a chunk of size 2504, which is longer than the specified 1000\n",
      "Created a chunk of size 1082, which is longer than the specified 1000\n",
      "Created a chunk of size 2268, which is longer than the specified 1000\n",
      "Created a chunk of size 1784, which is longer than the specified 1000\n",
      "Created a chunk of size 1311, which is longer than the specified 1000\n",
      "Created a chunk of size 2972, which is longer than the specified 1000\n",
      "Created a chunk of size 1144, which is longer than the specified 1000\n",
      "Created a chunk of size 1825, which is longer than the specified 1000\n",
      "Created a chunk of size 1508, which is longer than the specified 1000\n",
      "Created a chunk of size 2901, which is longer than the specified 1000\n",
      "Created a chunk of size 1715, which is longer than the specified 1000\n",
      "Created a chunk of size 1062, which is longer than the specified 1000\n",
      "Created a chunk of size 1206, which is longer than the specified 1000\n",
      "Created a chunk of size 1102, which is longer than the specified 1000\n",
      "Created a chunk of size 1184, which is longer than the specified 1000\n",
      "Created a chunk of size 1002, which is longer than the specified 1000\n",
      "Created a chunk of size 1065, which is longer than the specified 1000\n",
      "Created a chunk of size 1871, which is longer than the specified 1000\n",
      "Created a chunk of size 1754, which is longer than the specified 1000\n",
      "Created a chunk of size 2413, which is longer than the specified 1000\n",
      "Created a chunk of size 1771, which is longer than the specified 1000\n",
      "Created a chunk of size 2054, which is longer than the specified 1000\n",
      "Created a chunk of size 2000, which is longer than the specified 1000\n",
      "Created a chunk of size 2061, which is longer than the specified 1000\n",
      "Created a chunk of size 1066, which is longer than the specified 1000\n",
      "Created a chunk of size 1419, which is longer than the specified 1000\n",
      "Created a chunk of size 1368, which is longer than the specified 1000\n",
      "Created a chunk of size 1008, which is longer than the specified 1000\n",
      "Created a chunk of size 1227, which is longer than the specified 1000\n",
      "Created a chunk of size 1745, which is longer than the specified 1000\n",
      "Created a chunk of size 2296, which is longer than the specified 1000\n",
      "Created a chunk of size 1083, which is longer than the specified 1000\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3477\n"
     ]
    }
   ],
   "source": [
    "from langchain.text_splitter import CharacterTextSplitter\n",
    "\n",
    "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
    "texts = text_splitter.split_documents(docs)\n",
    "print(f\"{len(texts)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then embed chunks and upload them to the DeepLake.\n",
    "\n",
    "This can take several minutes. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "OpenAIEmbeddings(client=<class 'openai.api_resources.embedding.Embedding'>, model='text-embedding-ada-002', document_model_name='text-embedding-ada-002', query_model_name='text-embedding-ada-002', embedding_ctx_length=8191, openai_api_key=None, openai_organization=None, allowed_special=set(), disallowed_special='all', chunk_size=1000, max_retries=6)"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from langchain.embeddings.openai import OpenAIEmbeddings\n",
    "\n",
    "embeddings = OpenAIEmbeddings()\n",
    "embeddings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from langchain.vectorstores import DeepLake\n",
    "\n",
    "db = DeepLake.from_documents(texts, embeddings, dataset_path=f\"hub://{DEEPLAKE_ACCOUNT_NAME}/langchain-code\")\n",
    "db"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Question Answering\n",
    "First load the dataset, construct the retriever, then construct the Conversational Chain"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "-"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/user_name/langchain-code\n",
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "hub://user_name/langchain-code loaded successfully.\n",
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Deep Lake Dataset in hub://user_name/langchain-code already exists, loading from the storage\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Dataset(path='hub://user_name/langchain-code', read_only=True, tensors=['embedding', 'ids', 'metadata', 'text'])\n",
      "\n",
      "  tensor     htype      shape       dtype  compression\n",
      "  -------   -------    -------     -------  ------- \n",
      " embedding  generic  (3477, 1536)  float32   None   \n",
      "    ids      text     (3477, 1)      str     None   \n",
      " metadata    json     (3477, 1)      str     None   \n",
      "   text      text     (3477, 1)      str     None   \n"
     ]
    }
   ],
   "source": [
    "db = DeepLake(dataset_path=f\"hub://{DEEPLAKE_ACCOUNT_NAME}/langchain-code\", read_only=True, embedding_function=embeddings)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "retriever = db.as_retriever()\n",
    "retriever.search_kwargs['distance_metric'] = 'cos'\n",
    "retriever.search_kwargs['fetch_k'] = 20\n",
    "retriever.search_kwargs['maximal_marginal_relevance'] = True\n",
    "retriever.search_kwargs['k'] = 20"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also specify user defined functions using [Deep Lake filters](https://docs.deeplake.ai/en/latest/deeplake.core.dataset.html#deeplake.core.dataset.Dataset.filter)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "def filter(x):\n",
    "    # filter based on source code\n",
    "    if 'something' in x['text'].data()['value']:\n",
    "        return False\n",
    "    \n",
    "    # filter based on path e.g. extension\n",
    "    metadata =  x['metadata'].data()['value']\n",
    "    return 'only_this' in metadata['source'] or 'also_that' in metadata['source']\n",
    "\n",
    "### turn on below for custom filtering\n",
    "# retriever.search_kwargs['filter'] = filter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from langchain.chat_models import ChatOpenAI\n",
    "from langchain.chains import ConversationalRetrievalChain\n",
    "\n",
    "model = ChatOpenAI(model='gpt-3.5-turbo') # 'ada' 'gpt-3.5-turbo' 'gpt-4',\n",
    "qa = ConversationalRetrievalChain.from_llm(model,retriever=retriever)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "questions = [\n",
    "    \"What is the class hierarchy?\",\n",
    "    # \"What classes are derived from the Chain class?\",\n",
    "    # \"What classes and functions in the ./langchain/utilities/ forlder are not covered by unit tests?\",\n",
    "    # \"What one improvement do you propose in code in relation to the class herarchy for the Chain class?\",\n",
    "] \n",
    "chat_history = []\n",
    "\n",
    "for question in questions:  \n",
    "    result = qa({\"question\": question, \"chat_history\": chat_history})\n",
    "    chat_history.append((question, result['answer']))\n",
    "    print(f\"-> **Question**: {question} \\n\")\n",
    "    print(f\"**Answer**: {result['answer']} \\n\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "-> **Question**: What is the class hierarchy? \n",
    "\n",
    "**Answer**: There are several class hierarchies in the provided code, so I'll list a few:\n",
    "\n",
    "1. `BaseModel` -> `ConstitutionalPrinciple`: `ConstitutionalPrinciple` is a subclass of `BaseModel`.\n",
    "2. `BasePromptTemplate` -> `StringPromptTemplate`, `AIMessagePromptTemplate`, `BaseChatPromptTemplate`, `ChatMessagePromptTemplate`, `ChatPromptTemplate`, `HumanMessagePromptTemplate`, `MessagesPlaceholder`, `SystemMessagePromptTemplate`, `FewShotPromptTemplate`, `FewShotPromptWithTemplates`, `Prompt`, `PromptTemplate`: All of these classes are subclasses of `BasePromptTemplate`.\n",
    "3. `APIChain`, `Chain`, `MapReduceDocumentsChain`, `MapRerankDocumentsChain`, `RefineDocumentsChain`, `StuffDocumentsChain`, `HypotheticalDocumentEmbedder`, `LLMChain`, `LLMBashChain`, `LLMCheckerChain`, `LLMMathChain`, `LLMRequestsChain`, `PALChain`, `QAWithSourcesChain`, `VectorDBQAWithSourcesChain`, `VectorDBQA`, `SQLDatabaseChain`: All of these classes are subclasses of `Chain`.\n",
    "4. `BaseLoader`: `BaseLoader` is a subclass of `ABC`.\n",
    "5. `BaseTracer` -> `ChainRun`, `LLMRun`, `SharedTracer`, `ToolRun`, `Tracer`, `TracerException`, `TracerSession`: All of these classes are subclasses of `BaseTracer`.\n",
    "6. `OpenAIEmbeddings`, `HuggingFaceEmbeddings`, `CohereEmbeddings`, `JinaEmbeddings`, `LlamaCppEmbeddings`, `HuggingFaceHubEmbeddings`, `TensorflowHubEmbeddings`, `SagemakerEndpointEmbeddings`, `HuggingFaceInstructEmbeddings`, `SelfHostedEmbeddings`, `SelfHostedHuggingFaceEmbeddings`, `SelfHostedHuggingFaceInstructEmbeddings`, `FakeEmbeddings`, `AlephAlphaAsymmetricSemanticEmbedding`, `AlephAlphaSymmetricSemanticEmbedding`: All of these classes are subclasses of `BaseLLM`. \n",
    "\n",
    "\n",
    "-> **Question**: What classes are derived from the Chain class? \n",
    "\n",
    "**Answer**: There are multiple classes that are derived from the Chain class. Some of them are:\n",
    "- APIChain\n",
    "- AnalyzeDocumentChain\n",
    "- ChatVectorDBChain\n",
    "- CombineDocumentsChain\n",
    "- ConstitutionalChain\n",
    "- ConversationChain\n",
    "- GraphQAChain\n",
    "- HypotheticalDocumentEmbedder\n",
    "- LLMChain\n",
    "- LLMCheckerChain\n",
    "- LLMRequestsChain\n",
    "- LLMSummarizationCheckerChain\n",
    "- MapReduceChain\n",
    "- OpenAPIEndpointChain\n",
    "- PALChain\n",
    "- QAWithSourcesChain\n",
    "- RetrievalQA\n",
    "- RetrievalQAWithSourcesChain\n",
    "- SequentialChain\n",
    "- SQLDatabaseChain\n",
    "- TransformChain\n",
    "- VectorDBQA\n",
    "- VectorDBQAWithSourcesChain\n",
    "\n",
    "There might be more classes that are derived from the Chain class as it is possible to create custom classes that extend the Chain class.\n",
    "\n",
    "\n",
    "-> **Question**: What classes and functions in the ./langchain/utilities/ forlder are not covered by unit tests? \n",
    "\n",
    "**Answer**: All classes and functions in the `./langchain/utilities/` folder seem to have unit tests written for them. \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}