File size: 2,586 Bytes
cfd3735
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "bda1f3f5",
   "metadata": {},
   "source": [
    "# Gutenberg\n",
    "\n",
    ">[Project Gutenberg](https://www.gutenberg.org/about/) is an online library of free eBooks.\n",
    "\n",
    "This notebook covers how to load links to `Gutenberg` e-books into a document format that we can use downstream."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "9bfd5e46",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from langchain.document_loaders import GutenbergLoader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "700e4ef2",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "loader = GutenbergLoader('https://www.gutenberg.org/cache/epub/69972/pg69972.txt')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "b6f28930",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "data = loader.load()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "7d436441",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'The Project Gutenberg eBook of The changed brides, by Emma Dorothy\\r\\n\\n\\nEliza Nevitte Southworth\\r\\n\\n\\n\\r\\n\\n\\nThis eBook is for the use of anyone anywhere in the United States and\\r\\n\\n\\nmost other parts of the world at no cost and with almost no restrictions\\r\\n\\n\\nwhatsoever. You may copy it, give it away or re-u'"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[0].page_content[:300]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "1481beb1-12a7-4654-9d91-bfd101109891",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'source': 'https://www.gutenberg.org/cache/epub/69972/pg69972.txt'}"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[0].metadata"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}