{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "_ZCogkNcQhAB" }, "source": [ "# Gaussian Maximum Likelihood\n", "\n", "## MLE of a Gaussian $p_{model}(x|w)$\n", "\n", "You are given an array of data points called `data`. Your course site plots the negative log-likelihood function for several candidate hypotheses. Estimate the parameters of the Gaussian $p_{model}$ by coding an implementation that estimates its optimal parameters (15 points) and explaining what it does (10 points). You are free to use any Gradient-based optimization method you like. " ] }, { "cell_type": "code", "source": [ "import numpy as np\n", "import pandas as pd\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "\n", "np.random.seed(0)\n", "sns.set_theme(style='whitegrid', palette='pastel')\n", "\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "\n", "from sklearn.linear_model import LinearRegression" ], "metadata": { "id": "9tEZiMYncrvb" }, "execution_count": 73, "outputs": [] }, { "cell_type": "markdown", "source": [ "\n", "- $lnL(\\hat \\mu,\\sigma^2|Y) = {-N \\over 2} ln(2\\pi) - [\\sum_{i=1}^N{1\\over 2}ln \\sigma ^2 - {1 \\over 2 \\sigma ^2} (Y_i - \\mu)^2]$\n", "\n", "- ${\\partial ln L \\over \\partial \\mu} = {1 \\over \\sigma^2} \\sum_{i=1}^N(Y_i - \\mu)$\n", "\n", "- ${\\partial ln L \\over \\partial \\sigma ^2} = {1 \\over 2 \\sigma ^2} (-N + {1 \\over \\sigma^2} \\sum_{n=1}^N(Y_i - \\mu)^2)$" ], "metadata": { "id": "6y7UTqAlYqa8" } }, { "cell_type": "code", "execution_count": 74, "metadata": { "id": "_fbYCmRRQhAF" }, "outputs": [], "source": [ "data = [4, 5, 7, 8, 8, 9, 10, 5, 2, 3, 5, 4, 8, 9]\n", "\n", "# This function calculates the partial derivates of a negative log-likelihood function for the mean and variance\n", "def gradient(mean, var, x):\n", " N = len(x)\n", " mean_gradient = (1 / var) * np.sum(x - mean) \n", " var_gradient = (1 / (2 * var)) * (-N + (1 / var) * np.sum(np.subtract(x, mean) ** 2))\n", " return mean_gradient, var_gradient\n", " \n", "\n", "# Performs a gradient descent using data, starting params, learning rate, and number of iterations.\n", "# Each iteration changes theta partially based on the learning rate and eventually would converge towards\n", "# the true params.\n", "def gradient_descent(data, theta0, learning_rate, max_iter):\n", " mean, var = theta0\n", " x = np.array(data) # transform to numpy array for easier functions\n", " for _ in range(max_iter): \n", " g = gradient(mean, var, x)\n", " \n", " # update params with calculated partial derivatives\n", " mean = mean + learning_rate * g[0] \n", " var = var + learning_rate * g[1]\n", " return mean, var" ] }, { "cell_type": "code", "source": [ "iterations = 1000 # number of times we want to descent\n", "theta = (0, 1) # (mean, var)\n", "alpha = 0.01 # learning rate\n", "\n", "# calculate params\n", "e_mean, e_var = gradient_descent(data, theta, alpha, iterations)\n", "mean = np.mean(data) # true mean from data\n", "var = np.var(data) # true variance from data\n", "\n", "print(f\"Estimated params: mean={e_mean} variance={e_var}\")\n", "print(f\"True params: mean={mean} variance={var}\")" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "9-VFAZtKpwvM", "outputId": "77594a4f-4c7d-4201-b56a-b0fda92c93c3" }, "execution_count": 75, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Estimated params: mean=6.214285714179054 variance=5.851817293989179\n", "True params: mean=6.214285714285714 variance=5.882653061224489\n" ] } ] }, { "cell_type": "markdown", "metadata": { "id": "SqVOZaiAQhAI" }, "source": [ "## MLE of a conditional Gaussian $p_{model}(y|x,w)$\n", "\n", "You are given a problem that involves the relationship between $x$ and $y$. Estimate the parameters of a $p_{model}$ that fit the dataset (x,y) shown below. You are free to use any Gradient-based optimization method you like. \n" ] }, { "cell_type": "markdown", "source": [ "$MSE = {1 \\over n} \\sum_{i=1}^n (y_i - \\hat y_i)^2$ \n", "\n", "- $f(m, b) = {1 \\over n} \\sum_{i=1}^n (y_i - (mx_i+b))^2$\n", "\n", "${\\partial f \\over \\partial m} = {1 \\over n} \\sum_{i=1}^n -2x_i(y_i - (mx_i+b))$\n", "\n", "${\\partial f \\over \\partial b} = {1 \\over n} \\sum_{i=1}^n -2(y_i - (mx_i+b))$\n", "\n" ], "metadata": { "id": "OIdosdhMxn3D" } }, { "cell_type": "code", "execution_count": 76, "metadata": { "id": "4xoYaZCBQhAL" }, "outputs": [], "source": [ "x = np.array([8, 16, 22, 33, 50, 51])\n", "y = np.array([5, 20, 14, 32, 42, 58])\n", "\n", "# The goal here is to generate a p_model that is optimized for the following \n", "# linear regression: y = m * x + b\n", "# Because we are provided x and y, we can use gradient descent and mean-squared\n", "# error to optimize m and b. The code is attached below.\n", "# An additional implementation using sklearn's Linear Regression is also included.\n", "# y = m * x + b\n", "\n", "def conditional_gradient(x, y, m, b):\n", " n = len(x)\n", " m_gradient = -2 * np.sum(x * (y - (m * x + b))) / n\n", " b_gradient = -2 * np.sum(y - (m * x + b)) / n\n", " return m_gradient, b_gradient\n", " \n", "def conditional_gradient_descent(x, y, params, learning_rate, max_iter):\n", " m, b = params\n", " for _ in range(max_iter):\n", " g = conditional_gradient(x, y, m, b)\n", "\n", " m = m - learning_rate * g[0]\n", " b = b + learning_rate * g[1]\n", " return m, b" ] }, { "cell_type": "code", "source": [ "# The code below generates paramters from a self-implemented gradient descent\n", "# as well as through `sklearn`'s LinearRegression package. It takes the calculated\n", "# `m` and `b` and places them into the respective `estimate` and `actual` \n", "# functions\n", "\n", "params = (0, 5)\n", "iterations = 100\n", "alpha = 0.0001\n", "\n", "m, b = conditional_gradient_descent(x, y, params, alpha, iterations)\n", "\n", "def estimate(x):\n", " return m * x + b\n", "\n", "model = LinearRegression()\n", "model.fit(x.reshape(-1, 1), y)\n", "\n", "def actual(x):\n", " return model.coef_[0] * x + model.intercept_" ], "metadata": { "id": "Lu-6_UHKclJS" }, "execution_count": 79, "outputs": [] }, { "cell_type": "code", "source": [ "# The code below uses the previously calculated `estimate` and `actual` functions\n", "# to generate a graph with the original data points as well as the estimated and\n", "# actual lines.\n", "\n", "sns.scatterplot(x=x, y=y)\n", "\n", "start = min(x)\n", "end = max(x)\n", "\n", "pp1, pp2 = (start, estimate(start)), (end, estimate(end))\n", "ap1, ap2 = (start, actual(start)), (end, actual(end))\n", "\n", "plt.plot([pp1[0], pp2[0]], [pp1[1], pp2[1]], label='Estimated')\n", "plt.plot([ap1[0], ap2[0]], [ap1[1], ap2[1]], label='Actual')\n", "plt.legend(loc=\"upper left\")\n", "\n", "plt.show()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 269 }, "id": "126Kj3r-gBYL", "outputId": "4081c5e0-2d67-46c0-ae52-4f5195fb89b5" }, "execution_count": 80, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "\n" }, "metadata": {} } ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.10.9" }, "orig_nbformat": 4, "vscode": { "interpreter": { "hash": "7d6993cb2f9ce9a59d5d7380609d9cb5192a9dedd2735a011418ad9e827eb538" } }, "colab": { "provenance": [] } }, "nbformat": 4, "nbformat_minor": 0 }