Spaces:
Runtime error
Runtime error
File size: 27,459 Bytes
8711bb8 |
|
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "_ZCogkNcQhAB"
},
"source": [
"# Gaussian Maximum Likelihood\n",
"\n",
"## MLE of a Gaussian $p_{model}(x|w)$\n",
"\n",
"You are given an array of data points called `data`. Your course site plots the negative log-likelihood function for several candidate hypotheses. Estimate the parameters of the Gaussian $p_{model}$ by coding an implementation that estimates its optimal parameters (15 points) and explaining what it does (10 points). You are free to use any Gradient-based optimization method you like. "
]
},
{
"cell_type": "code",
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"\n",
"np.random.seed(0)\n",
"sns.set_theme(style='whitegrid', palette='pastel')\n",
"\n",
"import warnings\n",
"warnings.filterwarnings('ignore')\n",
"\n",
"from sklearn.linear_model import LinearRegression"
],
"metadata": {
"id": "9tEZiMYncrvb"
},
"execution_count": 73,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"\n",
"- $lnL(\\hat \\mu,\\sigma^2|Y) = {-N \\over 2} ln(2\\pi) - [\\sum_{i=1}^N{1\\over 2}ln \\sigma ^2 - {1 \\over 2 \\sigma ^2} (Y_i - \\mu)^2]$\n",
"\n",
"- ${\\partial ln L \\over \\partial \\mu} = {1 \\over \\sigma^2} \\sum_{i=1}^N(Y_i - \\mu)$\n",
"\n",
"- ${\\partial ln L \\over \\partial \\sigma ^2} = {1 \\over 2 \\sigma ^2} (-N + {1 \\over \\sigma^2} \\sum_{n=1}^N(Y_i - \\mu)^2)$"
],
"metadata": {
"id": "6y7UTqAlYqa8"
}
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {
"id": "_fbYCmRRQhAF"
},
"outputs": [],
"source": [
"data = [4, 5, 7, 8, 8, 9, 10, 5, 2, 3, 5, 4, 8, 9]\n",
"\n",
"# This function calculates the partial derivates of a negative log-likelihood function for the mean and variance\n",
"def gradient(mean, var, x):\n",
" N = len(x)\n",
" mean_gradient = (1 / var) * np.sum(x - mean) \n",
" var_gradient = (1 / (2 * var)) * (-N + (1 / var) * np.sum(np.subtract(x, mean) ** 2))\n",
" return mean_gradient, var_gradient\n",
" \n",
"\n",
"# Performs a gradient descent using data, starting params, learning rate, and number of iterations.\n",
"# Each iteration changes theta partially based on the learning rate and eventually would converge towards\n",
"# the true params.\n",
"def gradient_descent(data, theta0, learning_rate, max_iter):\n",
" mean, var = theta0\n",
" x = np.array(data) # transform to numpy array for easier functions\n",
" for _ in range(max_iter): \n",
" g = gradient(mean, var, x)\n",
" \n",
" # update params with calculated partial derivatives\n",
" mean = mean + learning_rate * g[0] \n",
" var = var + learning_rate * g[1]\n",
" return mean, var"
]
},
{
"cell_type": "code",
"source": [
"iterations = 1000 # number of times we want to descent\n",
"theta = (0, 1) # (mean, var)\n",
"alpha = 0.01 # learning rate\n",
"\n",
"# calculate params\n",
"e_mean, e_var = gradient_descent(data, theta, alpha, iterations)\n",
"mean = np.mean(data) # true mean from data\n",
"var = np.var(data) # true variance from data\n",
"\n",
"print(f\"Estimated params: mean={e_mean} variance={e_var}\")\n",
"print(f\"True params: mean={mean} variance={var}\")"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "9-VFAZtKpwvM",
"outputId": "77594a4f-4c7d-4201-b56a-b0fda92c93c3"
},
"execution_count": 75,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Estimated params: mean=6.214285714179054 variance=5.851817293989179\n",
"True params: mean=6.214285714285714 variance=5.882653061224489\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "SqVOZaiAQhAI"
},
"source": [
"## MLE of a conditional Gaussian $p_{model}(y|x,w)$\n",
"\n",
"You are given a problem that involves the relationship between $x$ and $y$. Estimate the parameters of a $p_{model}$ that fit the dataset (x,y) shown below. You are free to use any Gradient-based optimization method you like. \n"
]
},
{
"cell_type": "markdown",
"source": [
"$MSE = {1 \\over n} \\sum_{i=1}^n (y_i - \\hat y_i)^2$ \n",
"\n",
"- $f(m, b) = {1 \\over n} \\sum_{i=1}^n (y_i - (mx_i+b))^2$\n",
"\n",
"${\\partial f \\over \\partial m} = {1 \\over n} \\sum_{i=1}^n -2x_i(y_i - (mx_i+b))$\n",
"\n",
"${\\partial f \\over \\partial b} = {1 \\over n} \\sum_{i=1}^n -2(y_i - (mx_i+b))$\n",
"\n"
],
"metadata": {
"id": "OIdosdhMxn3D"
}
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {
"id": "4xoYaZCBQhAL"
},
"outputs": [],
"source": [
"x = np.array([8, 16, 22, 33, 50, 51])\n",
"y = np.array([5, 20, 14, 32, 42, 58])\n",
"\n",
"# The goal here is to generate a p_model that is optimized for the following \n",
"# linear regression: y = m * x + b\n",
"# Because we are provided x and y, we can use gradient descent and mean-squared\n",
"# error to optimize m and b. The code is attached below.\n",
"# An additional implementation using sklearn's Linear Regression is also included.\n",
"# y = m * x + b\n",
"\n",
"def conditional_gradient(x, y, m, b):\n",
" n = len(x)\n",
" m_gradient = -2 * np.sum(x * (y - (m * x + b))) / n\n",
" b_gradient = -2 * np.sum(y - (m * x + b)) / n\n",
" return m_gradient, b_gradient\n",
" \n",
"def conditional_gradient_descent(x, y, params, learning_rate, max_iter):\n",
" m, b = params\n",
" for _ in range(max_iter):\n",
" g = conditional_gradient(x, y, m, b)\n",
"\n",
" m = m - learning_rate * g[0]\n",
" b = b + learning_rate * g[1]\n",
" return m, b"
]
},
{
"cell_type": "code",
"source": [
"# The code below generates paramters from a self-implemented gradient descent\n",
"# as well as through `sklearn`'s LinearRegression package. It takes the calculated\n",
"# `m` and `b` and places them into the respective `estimate` and `actual` \n",
"# functions\n",
"\n",
"params = (0, 5)\n",
"iterations = 100\n",
"alpha = 0.0001\n",
"\n",
"m, b = conditional_gradient_descent(x, y, params, alpha, iterations)\n",
"\n",
"def estimate(x):\n",
" return m * x + b\n",
"\n",
"model = LinearRegression()\n",
"model.fit(x.reshape(-1, 1), y)\n",
"\n",
"def actual(x):\n",
" return model.coef_[0] * x + model.intercept_"
],
"metadata": {
"id": "Lu-6_UHKclJS"
},
"execution_count": 79,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# The code below uses the previously calculated `estimate` and `actual` functions\n",
"# to generate a graph with the original data points as well as the estimated and\n",
"# actual lines.\n",
"\n",
"sns.scatterplot(x=x, y=y)\n",
"\n",
"start = min(x)\n",
"end = max(x)\n",
"\n",
"pp1, pp2 = (start, estimate(start)), (end, estimate(end))\n",
"ap1, ap2 = (start, actual(start)), (end, actual(end))\n",
"\n",
"plt.plot([pp1[0], pp2[0]], [pp1[1], pp2[1]], label='Estimated')\n",
"plt.plot([ap1[0], ap2[0]], [ap1[1], ap2[1]], label='Actual')\n",
"plt.legend(loc=\"upper left\")\n",
"\n",
"plt.show()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 269
},
"id": "126Kj3r-gBYL",
"outputId": "4081c5e0-2d67-46c0-ae52-4f5195fb89b5"
},
"execution_count": 80,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {}
}
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.10.9"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "7d6993cb2f9ce9a59d5d7380609d9cb5192a9dedd2735a011418ad9e827eb538"
}
},
"colab": {
"provenance": []
}
},
"nbformat": 4,
"nbformat_minor": 0
} |