File size: 6,346 Bytes
cfd3735
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "0e499e90-7a6d-4fab-8aab-31a4df417601",
   "metadata": {},
   "source": [
    "# PowerBI Dataset Agent\n",
    "\n",
    "This notebook showcases an agent designed to interact with a Power BI Dataset. The agent is designed to answer more general questions about a dataset, as well as recover from errors.\n",
    "\n",
    "Note that, as this agent is in active development, all answers might not be correct. It runs against the [executequery endpoint](https://learn.microsoft.com/en-us/rest/api/power-bi/datasets/execute-queries), which does not allow deletes.\n",
    "\n",
    "### Some notes\n",
    "- It relies on authentication with the azure.identity package, which can be installed with `pip install azure-identity`. Alternatively you can create the powerbi dataset with a token as a string without supplying the credentials.\n",
    "- You can also supply a username to impersonate for use with datasets that have RLS enabled. \n",
    "- The toolkit uses a LLM to create the query from the question, the agent uses the LLM for the overall execution.\n",
    "- Testing was done mostly with a `text-davinci-003` model, codex models did not seem to perform ver well."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec927ac6-9b2a-4e8a-9a6e-3e429191875c",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Initialization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "53422913-967b-4f2a-8022-00269c1be1b1",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from langchain.agents.agent_toolkits import create_pbi_agent\n",
    "from langchain.agents.agent_toolkits import PowerBIToolkit\n",
    "from langchain.utilities.powerbi import PowerBIDataset\n",
    "from langchain.llms.openai import AzureOpenAI\n",
    "from langchain.agents import AgentExecutor\n",
    "from azure.identity import DefaultAzureCredential"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "090f3699-79c6-4ce1-ab96-a94f0121fd64",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "fast_llm = AzureOpenAI(temperature=0.5, max_tokens=1000, deployment_name=\"gpt-35-turbo\", verbose=True)\n",
    "smart_llm = AzureOpenAI(temperature=0, max_tokens=100, deployment_name=\"gpt-4\", verbose=True)\n",
    "\n",
    "toolkit = PowerBIToolkit(\n",
    "    powerbi=PowerBIDataset(dataset_id=\"<dataset_id>\", table_names=['table1', 'table2'], credential=DefaultAzureCredential()), \n",
    "    llm=smart_llm\n",
    ")\n",
    "\n",
    "agent_executor = create_pbi_agent(\n",
    "    llm=fast_llm,\n",
    "    toolkit=toolkit,\n",
    "    verbose=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "36ae48c7-cb08-4fef-977e-c7d4b96a464b",
   "metadata": {},
   "source": [
    "## Example: describing a table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ff70e83d-5ad0-4fc7-bb96-27d82ac166d7",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "agent_executor.run(\"Describe table1\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "9abcfe8e-1868-42a4-8345-ad2d9b44c681",
   "metadata": {},
   "source": [
    "## Example: simple query on a table\n",
    "In this example, the agent actually figures out the correct query to get a row count of the table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bea76658-a65b-47e2-b294-6d52c5556246",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "agent_executor.run(\"How many records are in table1?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6fbc26af-97e4-4a21-82aa-48bdc992da26",
   "metadata": {},
   "source": [
    "## Example: running queries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "17bea710-4a23-4de0-b48e-21d57be48293",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "agent_executor.run(\"How many records are there by dimension1 in table2?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "474dddda-c067-4eeb-98b1-e763ee78b18c",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "agent_executor.run(\"What unique values are there for dimensions2 in table2\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "6fd950e4",
   "metadata": {},
   "source": [
    "## Example: add your own few-shot prompts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "87d677f9",
   "metadata": {},
   "outputs": [],
   "source": [
    "#fictional example\n",
    "few_shots = \"\"\"\n",
    "Question: How many rows are in the table revenue?\n",
    "DAX: EVALUATE ROW(\"Number of rows\", COUNTROWS(revenue_details))\n",
    "----\n",
    "Question: How many rows are in the table revenue where year is not empty?\n",
    "DAX: EVALUATE ROW(\"Number of rows\", COUNTROWS(FILTER(revenue_details, revenue_details[year] <> \"\")))\n",
    "----\n",
    "Question: What was the average of value in revenue in dollars?\n",
    "DAX: EVALUATE ROW(\"Average\", AVERAGE(revenue_details[dollar_value]))\n",
    "----\n",
    "\"\"\"\n",
    "toolkit = PowerBIToolkit(\n",
    "    powerbi=PowerBIDataset(dataset_id=\"<dataset_id>\", table_names=['table1', 'table2'], credential=DefaultAzureCredential()), \n",
    "    llm=smart_llm,\n",
    "    examples=few_shots,\n",
    ")\n",
    "agent_executor = create_pbi_agent(\n",
    "    llm=fast_llm,\n",
    "    toolkit=toolkit,\n",
    "    verbose=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "33f4bb43",
   "metadata": {},
   "outputs": [],
   "source": [
    "agent_executor.run(\"What was the maximum of value in revenue in dollars in 2022?\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}