File size: 5,057 Bytes
093adcb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "8ec2fef2",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Lecture 4: Software Engineering Applied to LLMs\n",
    "* **Created by:** Eric Martinez\n",
    "* **For:** Software Engineering 2\n",
    "* **At:** University of Texas Rio-Grande Valley"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "60fef658",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Quality and Performance Issues\n",
    "\n",
    "* Applications depend on external APIs which has issues with flakiness and pricing, how do we avoid hitting APIs in testing?\n",
    "* Responses may not be correct or accurate, how do we increase confidence in result?\n",
    "* Responses may be biased or unethical or unwanted output, how do we stop this type of output?\n",
    "* User requests could be unethical or unwanted input, how do we filter this type of input?\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2fc1b19a",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Prototyping\n",
    "* Develop prompt prototypes early when working with customers or stakeholders, it is fast and cheap to test that the idea will work.\n",
    "* Test against realistic examples, early. Fail fast and iterate quickly.\n",
    "* Make a plan for how you will source dynamic data. If there is no path, the project is dead in the water."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2528a3c9",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Testing\n",
    "* Unit test prompts using traditional methods to increase confidence.\n",
    "* Unit test your prompts using LLMs to increase confidence.\n",
    "* Write tests that handle API errors or bad output (malformed, incorrect, unethical).\n",
    "* Use 'mocking' in integration tests to avoid unnecessary calls to APIs, flakiness, and unwanted charges."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d9cdafd2",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Handling Bad Output\n",
    "* Develop 'retry' mechanisms when you get unwanted output.\n",
    "* Develop specific prompts for different 'retry' conditions. Include the context, what went wrong, and what needs to be fixed.\n",
    "* Consider adding logging to your app to keep track of how often your app gets bad output."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8f7de0be",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Template Languages and Version Control\n",
    "* Consider writing your prompt templates in dynamic template languages like ERB, Handlebars, etc.\n",
    "* Keep prompt templates and prompts in version control in your app's repo.\n",
    "* Write tests for handling template engine errors."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3987a54c",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Prompt Injection/Leakage\n",
    "* User-facing prompts should be tested against prompt injection attacks\n",
    "* Validate input at the UI and LLM level\n",
    "* Consider using an LLM to check if an output is similar to the prompt\n",
    "* Have mechanisms for anomaly detection and incident response"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0e0c388",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Security\n",
    "* **Do not:** store API keys in application code as strings, encrypted or not.\n",
    "* **Do not:** store API keys in compiled binaries distributed to users.\n",
    "* **Do not:** store API keys in metadeta files bundled with your application.\n",
    "* **Do:** create an intermediate web app (or API) with authentication/authorization that delegates requests to LLMs at run-time for use in front-end applications\n",
    "* **Do:** if your front-end application does not have user accounts, consider implementing guest or anonymous accounts and expiring or rotating keys\n",
    "* **Do:** when allowing LLMs to use tools, consider designing systems to pass-through user ids to tools so that they tools operate at the same level of access as the end-user"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "181dd4ad",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "celltoolbar": "Raw Cell Format",
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}