# Introduction to Large Language Models
* **Created by:** Eric Martinez
* **For:** Software Engineering 2
* **At:** University of Texas Rio-Grande Valley

## Overview of LLMs and their capabilities

An LLM is a machine learning model designed to understand and generate human-like text. They are trained on vast amounts of text data and can perform a wide range of tasks, such as translation, summarization, and question-answering.

Some capabilities of LLMs include natural language understanding, question answering, instruction-following, text and code generation, sentiment analysis, and more.

**Key Points:**

* What is it: ML model trained to understand and generate human-like text.


* Capabilities: Natural Language Understanding, Q&A, text/code generation, etc.

## LLM Components

LLMs are trained to predict the next word in a sentence, given the context of the previous words. This task is known as language modeling.

Jeremy Howard, along with Sebastian Ruder, developed the ULMFiT (Universal Language Model Fine-tuning) approach, which leverages transfer learning for NLP tasks that contributed towards the current state-of-the-art.

ULMFiT was introduced during a free online course called Fast.AI, where Jeremy Howard demonstrated its effectiveness in various NLP tasks. The approach gained significant attention and contributed to the development of more advanced LLMs.

Transformers are a type of neural network architecture that uses self-attention mechanisms to process input data in parallel, rather than sequentially. This allows for faster training and improved performance on long-range dependencies in text.

Transformers have led to breakthroughs in NLP, such as the development of BERT, GPT, and other state-of-the-art models.

Before transformers, NLP capabilities were limited by the inability to effectively capture long-range dependencies and the reliance on recurrent neural networks (RNNs) and convolutional neural networks (CNNs).

Transfer learning is the process of using a pre-trained model as a starting point and fine-tuning it for a specific task. In the context of LLMs, transfer learning allows models to leverage vast amounts of pre-existing knowledge, leading to improved performance and reduced training time.

Advanced Reading:
* [Universal Language Model Fine-tuning for Text Classification](https://arxiv.org/abs/1801.06146)
* [Attention Is All You Need (Transformers)](https://arxiv.org/pdf/1706.03762.pdf)
* [Blog Post: The Illustrated Transformer](https://jalammar.github.io/illustrated-transformer/)


**Key Points:**

* Task: Next-word prediction

* Breakthrough: Fine-tuning a Pretrained Model (ULMFiT)

* Transformer Architecture: Used in most state-of-the-art models such as BERT, GPT, LLAMA

* Transfer Learning: 'Fine-tuning' improves performance, reduces training time, and key to techniques like RLHF

* RLHF: Key technique in improving quality of output and aligning with human values

## LLMs vs Other ML Models

LLMs are specifically designed for natural language processing tasks, whereas other machine learning models may be designed for tasks such as image recognition or reinforcement learning.

Large-scale LLMs like OpenAI's GPT models have significantly more parameters and are trained on much larger datasets, resulting in more powerful and versatile NLP capabilities than traditional NLP approaches.

Advanced Reading
* [Language Models are Few-Shot Learners (GPT-3)](https://arxiv.org/pdf/2005.14165.pdf)

**Key Points:**

* NLP Focus

* Larger Model (Parameters)

* Larger Datasets

## LLM Advancements

OpenAI's latest GPT models have billions of parameters and are trained on massive datasets, making them some of the most powerful NLP models to date.

Access to these powerful GPT models is now available through APIs, which has democratized access to high-quality NLP tools and enabled a wide range of applications.

**Key Points:**

* GPT models

* API Access

* Wide Applications

## Use cases

* AI assistants, Chatbots

* Programming Assistance

* Healthcare

* Education

* Interfacing with Data: Analytics, Search, Recommendation

* Sales / Marketing / Ads

## Limitations & Challenges

Current LLMs are susceptible to hallucination. Hallucination refers to instances where the model generates text that appears coherent and plausible but is not grounded in reality or factual information. Hallucination can lead to misinformation, slander, and other harmful consequences.

LLMs can inherit biases from the data they are trained on, which can lead to biased outputs and potentially harmful consequences in downstream applications.

Ethical concerns surrounding LLMs include the potential for misuse, such as generating fake news or other malicious content, as well as the potential to exacerbate existing societal issues.

Training, fine-tuning, and inference with LLMs can be computationally expensive, requiring powerful hardware and potentially limiting their accessibility and scalability.

**Key Points:**

* Hallucination

* Biases

* Ethical concerns

* Computational requirements

## Steering LLMs

Prompting involves carefully crafting input text to guide the LLM's output, which can help achieve desired results and mitigate potential issues.

Training your own LLM allows for greater control over the model's behavior and output, but requires significant computational resources and expertise.

Fine-tuning involves adjusting an existing LLM to better suit a specific task or domain, which can help improve performance and steer the model's output.

**Key Points:**

* Prompting

* Training

* Fine-tuning

## Alignment and Improvement

#### Alignment & Improvement: What is alignment?

**Key Points:**

* Definition: Alignment refers to the process of ensuring that an LLM's behavior and output align with human values, intentions, and expectations.

* Importance: Ensures that LLMs are useful, safe, and do not produce harmful or unintended consequences.

* Challenges: Alignment is challenging due to the diverse range of morals, ethics, and sensibilities across different countries, regions, and demographics.

#### Alignment & Improvement: Model Quality

**Key Points:**

* Definition: Quality output in terms of LLMs refers to text that is coherent, relevant, accurate, and adheres to the desired task, values, and intentions.

* Reinforcement Learning from Human Feedback (RLHF): technique used to align models and improve their quality by incorporating human feedback into the training process.

* Challenges of Human Feedback: Varying morals, ethics, and sensibilities of human raters.

* Things to Consider if using RLHF: Carefully selecting training data, incorporating diverse perspectives, and iteratively refining the model.

#### Alignment & Improvement: Evaluation Metrics

Evaluation metrics are quantitative measures used to assess the performance of LLMs on specific tasks or objectives. Common evaluation metrics for LLMs include perplexity, BLEU score, ROUGE score, F1 score, and accuracy, among others.

Metric Definitions:
* Perplexity measures how well an LLM predicts the next word in a sequence, with lower perplexity indicating better performance.
* BLEU score is used to evaluate the quality of machine-generated translations by comparing them to human-generated reference translations.
* ROUGE score is used to evaluate the quality of text summarization by comparing the generated summary to a reference summary.
* F1 score is a measure of a model's accuracy on a classification task, considering both precision and recall.
* Accuracy is the proportion of correct predictions made by the model out of the total number of predictions.

Evaluation metrics provide a quantitative way to measure the performance of LLMs, allowing developers to identify areas for improvement and track progress over time. By comparing the performance of different models or training configurations, developers can identify the most effective approaches and optimize their models accordingly. Evaluation metrics can also be used to guide the fine-tuning process, by providing feedback on the model's performance on specific tasks or domains.

In the context of reinforcement learning from human feedback (RLHF), evaluation metrics can be used to quantify the alignment of the model with human values and intentions, guiding the iterative refinement process. It is important to note that evaluation metrics should be chosen carefully, as they may not always capture the full range of desired qualities in LLM outputs. Developers should consider using a combination of metrics and human evaluation to ensure a comprehensive assessment of model performance.

**Key Points:**

* Definition: quantitative measures used to assess the performance of LLMs on specific tasks or objectives.

* Common metrics: Perplexity, BLEU, ROUGE, F1, Accuracy, and others

* Role of Evaluation Metrics: track improvement, compare, guide fine-tuning, quantify alignment

* Pitfalls of Evaluation Metrics: they may not actually represent or capture human alignment or values, should be used in combination with human evaluation

Advanced Reading:
* [Training language models to follow instructions with human feedback](https://arxiv.org/pdf/2203.02155.pdf)
* [Alignnment of Language Agents](https://arxiv.org/pdf/2103.14659.pdf)
* [Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback](https://arxiv.org/pdf/2204.05862.pdf)

## Open-Source vs Closed Source LLMs

#### Open-Source LLMs

**Key Points:**

* Definition: models whose code, architecture, and weights are publicly available, allowing anyone to use, modify, and contribute to their development.

* Pros: increased transparency, collaboration, and accessibility.

* Cons: potential misuse and difficulty in controlling the distribution of powerful models.

* Societal Risks: potential for misuse, rogue agents, the spread of harmful content, and the exacerbation of existing biases and inequalities.

#### Closed-Source LLMs

**Key Points:**

* Definition: models whose code, architecture, and weights are proprietary and not publicly available.

* Pros: greater control over distribution and usage, as well as the potential for higher-quality models due to focused development efforts.

* Cons: cost, minimal insight into architecture training process, minimal customization, etc.

* Societal Risks: potential for monopolistic control, reduced innovation, and limited access to powerful models.

## Ethical Considerations as LLM Engineers

**Key Points:**

* Awareness and care at handling: misinformation, harmful output, biased output

* Awareness of implications of automation solutions on job market and economy

* Awareness and care at handling: security, prompt injection, and rogue agents

* Consider benefits and risks, include diverse perspectives

* Engineer solutions that pro-actively address morality, ethics, and safety