# Software Engineering Applied to LLMs
* **Created by:** Eric Martinez
* **For:** Software Engineering 2
* **At:** University of Texas Rio-Grande Valley

## Concerns with Quality and Performance Issues of LLM Integrated Apps

* Applications depend on external APIs which has issues with flakiness and pricing, how do we avoid hitting APIs in testing?
* Responses may not be correct or accurate, how do we increase confidence in result?
* Responses may be biased or unethical or unwanted output, how do we stop this type of output?
* User requests could be unethical or unwanted input, how do we filter this type of input?


## Lessons from Software Engineering Applied to LLMS:

#### Prototyping
* Develop prompt prototypes early when working with customers or stakeholders, it is fast and cheap to test that the idea will work.
* Test against realistic examples, early. Fail fast and iterate quickly.
* Make a plan for how you will source dynamic data. If there is no path, the project is dead in the water.

#### Testing
* Unit test prompts using traditional methods to increase confidence.
* Unit test your prompts using LLMs to increase confidence.
* Write tests that handle API errors or bad output (malformed, incorrect, unethical).
* Use 'mocking' in integration tests to avoid unnecessary calls to APIs, flakiness, and unwanted charges.

#### Handling Bad Output
* Develop 'retry' mechanisms when you get unwanted output.
* Develop specific prompts for different 'retry' conditions. Include the context, what went wrong, and what needs to be fixed.
* Consider adding logging to your app to keep track of how often your app gets bad output.

#### Template Languages and Version Control
* Consider writing your prompt templates in dynamic template languages like ERB, Handlebars, etc.
* Keep prompt templates and prompts in version control in your app's repo.
* Write tests for handling template engine errors.

#### Prompt Injection/Leakage
* User-facing prompts should be tested against prompt injection attacks
* Validate input at the UI and LLM level
* Consider using an LLM to check if an output is similar to the prompt
* Have mechanisms for anomaly detection and incident response

#### Security
* **Do not:** store API keys in application code as strings, encrypted or not.
* **Do not:** store API keys in compiled binaries distributed to users.
* **Do not:** store API keys in metadeta files bundled with your application.
* **Do:** store API keys in environment variables or cloud secrets.
* **Do:** store API keys in a `.env` file that is blocked from version control. (Ideally these are encrypted with a secret that is not in version control, but that is beyond the scope of today's discussion.)
* **Do:** create an intermediate web app (or API) with authentication/authorization that delegates requests to LLMs at run-time for use in front-end applications.
* **Do:** if your front-end application does not have user accounts, consider implementing guest or anonymous accounts and expiring or rotating keys.
* **Do:** when allowing LLMs to use tools, consider designing systems to pass-through user ids to tools so that they tools operate at the same level of access as the end-user.

## Production Deployment Considerations

* Wrap LLM features as web service or API. Don't give out your OpenAI keys directly in distributed software.
    - For example: Django, Flask, FastAPI, Express.js, Sinatra, Ruby on Rails

* Consider whether there are any regulations that might impact how you handle data, such as GDPR and HIPAA.
    - Regulation may require specific data handling and storage practices.
    - Cloud providers may offer compliance certifications and assessment tools.
    - On-prem deployments can provide more control of data storage and processing, but might require more resources (hardware, people, software) for management and maintenance
    - Cloud providers like Azure have great tools like Azure Defender for Cloud and Microsoft Purview for managing compliance

- Using Cloud Services vs On-Prem
    - Cloud services offer many advantages such as scalability, flexibilitiy, cost-effectiveness, and ease of management.
    - Easy to spin up resources and scale based on demand, without worrying about infrastructure or maintenance.
    - Wide range of tools: performance optimization, monitoring, security, reliability.

- Container-based Architecture
    - Containerization is a lightweight virtualization method that packages an application and its dependencies into a single, portable unit called a container.
    - Containers can run consistently across different environments, making it easier to develop, test, and deploy applications. 
    - Containerization is useful when you need to ensure consistent behavior across various platforms, simplify deployment and scaling, and improve resource utilization.
    - Common tools for deploying container-based architecture are Docker and Kubernetes.

- Serverless Architectures
    - Serverless architectures are a cloud computing model where the cloud provider manages the infrastructure and automatically allocates resources based on the application's needs.
    - Developers only need to focus on writing code, and the provider takes care of scaling, patching, and maintaining the underlying infrastructure. 
    - Serverless architectures can be useful when you want to reduce operational overhead, build event-driven applications, and optimize costs by paying only for the resources you actually use.
    - Common tools to build serverless applications and APIs include Azure Functions, AWS Lambda, and Google Cloud Functions.

- HuggingFace
    - Platforms like HuggingFace provide an ecosystem for sharing, collaborating, and deploying AI models, including LLMs. 
    - They offer pre-trained models, tools, and APIs that simplify the development and integration of AI-powered applications. 
    - These platforms can be useful when you want to leverage existing models, collaborate with the AI community, and streamline the deployment process for your LLM-based applications.