File size: 3,937 Bytes
cfd3735
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a175c650",
   "metadata": {},
   "source": [
    "# Benchmarking Template\n",
    "\n",
    "This is an example notebook that can be used to create a benchmarking notebook for a task of your choice. Evaluation is really hard, and so we greatly welcome any contributions that can make it easier for people to experiment"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "984169ca",
   "metadata": {},
   "source": [
    "It is highly reccomended that you do any evaluation/benchmarking with tracing enabled. See [here](https://langchain.readthedocs.io/en/latest/tracing.html) for an explanation of what tracing is and how to set it up."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "9fe4d1b4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Comment this out if you are NOT using tracing\n",
    "import os\n",
    "os.environ[\"LANGCHAIN_HANDLER\"] = \"langchain\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f66405e",
   "metadata": {},
   "source": [
    "## Loading the data\n",
    "\n",
    "First, let's load the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "79402a8f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# This notebook should so how to load the dataset from LangChainDatasets on Hugging Face\n",
    "\n",
    "# Please upload your dataset to https://huggingface.co/LangChainDatasets\n",
    "\n",
    "# The value passed into `load_dataset` should NOT have the `LangChainDatasets/` prefix\n",
    "from langchain.evaluation.loading import load_dataset\n",
    "dataset = load_dataset(\"TODO\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a16b75d",
   "metadata": {},
   "source": [
    "## Setting up a chain\n",
    "\n",
    "This next section should have an example of setting up a chain that can be run on this dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a2661ce0",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "6c0062e7",
   "metadata": {},
   "source": [
    "## Make a prediction\n",
    "\n",
    "First, we can make predictions one datapoint at a time. Doing it at this level of granularity allows use to explore the outputs in detail, and also is a lot cheaper than running over multiple datapoints"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "d28c5e7d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Example of running the chain on a single datapoint (`dataset[0]`) goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d0c16cd7",
   "metadata": {},
   "source": [
    "## Make many predictions\n",
    "Now we can make predictions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "24b4c66e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Example of running the chain on many predictions goes here\n",
    "\n",
    "# Sometimes its as simple as `chain.apply(dataset)`\n",
    "\n",
    "# Othertimes you may want to write a for loop to catch errors"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4783344b",
   "metadata": {},
   "source": [
    "## Evaluate performance\n",
    "\n",
    "Any guide to evaluating performance in a more systematic manner goes here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7710401a",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}