{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "SCjmX4zTCkRK" }, "source": [ "## Setup\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "!pip install pydot" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "q-YbjCkzw0yU", "outputId": "2e75e5ba-bedf-43f2-d2c3-270fd0070ca6", "tags": [] }, "outputs": [], "source": [ "# A dependency of the preprocessing for BERT inputs\n", "!pip install -q -U \"tensorflow-text==2.12.*\"" ] }, { "cell_type": "markdown", "metadata": { "id": "5w_XlxN1IsRJ" }, "source": [ "You will use the AdamW optimizer from [tensorflow/models](https://github.com/tensorflow/models)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "b-P1ZOA0FkVJ", "tags": [] }, "outputs": [], "source": [ "!pip install -q tf-models-official==2.12.0" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "_XgTpm9ZxoN9", "tags": [] }, "outputs": [], "source": [ "import os\n", "import shutil\n", "\n", "import pandas as pd\n", "import numpy as np\n", "\n", "import tensorflow as tf\n", "import tensorflow_hub as hub\n", "import tensorflow_text as text\n", "\n", "import matplotlib.pyplot as plt\n", "\n", "from official.nlp import optimization # to create AdamW optimizer\n", "from sklearn.preprocessing import LabelEncoder\n", "from tensorflow.keras.utils import to_categorical\n", "\n", "tf.get_logger().setLevel('ERROR')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "6IwI_2bcIeX8", "outputId": "8e2c3829-138d-4d11-ce33-38bde48b9865", "tags": [] }, "outputs": [], "source": [ "# Load the CSV file using pandas\n", "ds = pd.read_csv('../data/all_sentiment_datasets.csv')\n", "ds = ds.sample(frac=1).reset_index(drop=True)\n", "\n", "labels_columns = \"sentiment\"\n", "# Extract the features (inputs) and labels (outputs)\n", "features = ds[\"sentence\"]\n", "labels = ds[labels_columns]\n", "class_names = np.unique(labels)\n", "labels_tags = ['negative', 'positive']\n", "# Split the data into training and testing sets\n", "subset_range = int(len(features) * 0.01)\n", "features = features[:subset_range]\n", "labels = labels[:subset_range]\n", "split_range = int(len(features) * 0.8)\n", "train_ds, test_ds = features[:split_range], features[split_range:]\n", "train_labels, test_labels = labels[:split_range], labels[split_range:] " ] }, { "cell_type": "markdown", "metadata": { "id": "HGm10A5HRGXp" }, "source": [ "Let's take a look at a few reviews." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "JuxDkcvVIoev", "outputId": "19095674-83bd-4057-d890-681abc549fb0", "tags": [] }, "outputs": [], "source": [ "for i in range(3):\n", " print(f'Review: {train_ds[i]}')\n", " label = train_labels[i]\n", " print(f'Label : {label} ({labels_tags[label]})')" ] }, { "cell_type": "markdown", "metadata": { "id": "dX8FtlpGJRE6" }, "source": [ "## Loading models from TensorFlow Hub\n", "\n", "Here you can choose which BERT model you will load from TensorFlow Hub and fine-tune. There are multiple BERT models available.\n", "\n", " - [BERT-Base](https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/3), [Uncased](https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/3) and [seven more models](https://tfhub.dev/google/collections/bert/1) with trained weights released by the original BERT authors.\n", " - [Small BERTs](https://tfhub.dev/google/collections/bert/1) have the same general architecture but fewer and/or smaller Transformer blocks, which lets you explore tradeoffs between speed, size and quality.\n", " - [ALBERT](https://tfhub.dev/google/collections/albert/1): four different sizes of \"A Lite BERT\" that reduces model size (but not computation time) by sharing parameters between layers.\n", " - [BERT Experts](https://tfhub.dev/google/collections/experts/bert/1): eight models that all have the BERT-base architecture but offer a choice between different pre-training domains, to align more closely with the target task.\n", " - [Electra](https://tfhub.dev/google/collections/electra/1) has the same architecture as BERT (in three different sizes), but gets pre-trained as a discriminator in a set-up that resembles a Generative Adversarial Network (GAN).\n", " - BERT with Talking-Heads Attention and Gated GELU [[base](https://tfhub.dev/tensorflow/talkheads_ggelu_bert_en_base/1), [large](https://tfhub.dev/tensorflow/talkheads_ggelu_bert_en_large/1)] has two improvements to the core of the Transformer architecture.\n", "\n", "The model documentation on TensorFlow Hub has more details and references to the\n", "research literature. Follow the links above, or click on the [`tfhub.dev`](http://tfhub.dev) URL\n", "printed after the next cell execution.\n", "\n", "The suggestion is to start with a Small BERT (with fewer parameters) since they are faster to fine-tune. If you like a small model but with higher accuracy, ALBERT might be your next option. If you want even better accuracy, choose\n", "one of the classic BERT sizes or their recent refinements like Electra, Talking Heads, or a BERT Expert.\n", "\n", "Aside from the models available below, there are [multiple versions](https://tfhub.dev/google/collections/transformer_encoders_text/1) of the models that are larger and can yield even better accuracy, but they are too big to be fine-tuned on a single GPU. You will be able to do that on the [Solve GLUE tasks using BERT on a TPU colab](https://www.tensorflow.org/text/tutorials/bert_glue).\n", "\n", "You'll see in the code below that switching the tfhub.dev URL is enough to try any of these models, because all the differences between them are encapsulated in the SavedModels from TF Hub." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "colab": { "base_uri": "https://localhost:8080/" }, "id": "y8_ctG55-uTX", "outputId": "b9a73071-d37a-4bc0-f632-11f430ee4796", "tags": [] }, "outputs": [], "source": [ "#@title Choose a BERT model to fine-tune\n", "\n", "bert_model_name = 'small_bert/bert_en_uncased_L-10_H-512_A-8' #@param [\"bert_en_uncased_L-12_H-768_A-12\", \"bert_en_cased_L-12_H-768_A-12\", \"bert_multi_cased_L-12_H-768_A-12\", \"small_bert/bert_en_uncased_L-2_H-128_A-2\", \"small_bert/bert_en_uncased_L-2_H-256_A-4\", \"small_bert/bert_en_uncased_L-2_H-512_A-8\", \"small_bert/bert_en_uncased_L-2_H-768_A-12\", \"small_bert/bert_en_uncased_L-4_H-128_A-2\", \"small_bert/bert_en_uncased_L-4_H-256_A-4\", \"small_bert/bert_en_uncased_L-4_H-512_A-8\", \"small_bert/bert_en_uncased_L-4_H-768_A-12\", \"small_bert/bert_en_uncased_L-6_H-128_A-2\", \"small_bert/bert_en_uncased_L-6_H-256_A-4\", \"small_bert/bert_en_uncased_L-6_H-512_A-8\", \"small_bert/bert_en_uncased_L-6_H-768_A-12\", \"small_bert/bert_en_uncased_L-8_H-128_A-2\", \"small_bert/bert_en_uncased_L-8_H-256_A-4\", \"small_bert/bert_en_uncased_L-8_H-512_A-8\", \"small_bert/bert_en_uncased_L-8_H-768_A-12\", \"small_bert/bert_en_uncased_L-10_H-128_A-2\", \"small_bert/bert_en_uncased_L-10_H-256_A-4\", \"small_bert/bert_en_uncased_L-10_H-512_A-8\", \"small_bert/bert_en_uncased_L-10_H-768_A-12\", \"small_bert/bert_en_uncased_L-12_H-128_A-2\", \"small_bert/bert_en_uncased_L-12_H-256_A-4\", \"small_bert/bert_en_uncased_L-12_H-512_A-8\", \"small_bert/bert_en_uncased_L-12_H-768_A-12\", \"albert_en_base\", \"electra_small\", \"electra_base\", \"experts_pubmed\", \"experts_wiki_books\", \"talking-heads_base\"]\n", "\n", "map_name_to_handle = {\n", " 'bert_en_uncased_L-12_H-768_A-12':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/3',\n", " 'bert_en_cased_L-12_H-768_A-12':\n", " 'https://tfhub.dev/tensorflow/bert_en_cased_L-12_H-768_A-12/3',\n", " 'bert_multi_cased_L-12_H-768_A-12':\n", " 'https://tfhub.dev/tensorflow/bert_multi_cased_L-12_H-768_A-12/3',\n", " 'small_bert/bert_en_uncased_L-2_H-128_A-2':\n", " 'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-128_A-2/1',\n", " 'small_bert/bert_en_uncased_L-2_H-256_A-4':\n", " 'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-256_A-4/1',\n", " 'small_bert/bert_en_uncased_L-2_H-512_A-8':\n", " 'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-512_A-8/1',\n", " 'small_bert/bert_en_uncased_L-2_H-768_A-12':\n", " 'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-768_A-12/1',\n", " 'small_bert/bert_en_uncased_L-4_H-128_A-2':\n", " 'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-128_A-2/1',\n", " 'small_bert/bert_en_uncased_L-4_H-256_A-4':\n", " 'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-256_A-4/1',\n", " 'small_bert/bert_en_uncased_L-4_H-512_A-8':\n", " 'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-512_A-8/1',\n", " 'small_bert/bert_en_uncased_L-4_H-768_A-12':\n", " 'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-768_A-12/1',\n", " 'small_bert/bert_en_uncased_L-6_H-128_A-2':\n", " 'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-6_H-128_A-2/1',\n", " 'small_bert/bert_en_uncased_L-6_H-256_A-4':\n", " 'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-6_H-256_A-4/1',\n", " 'small_bert/bert_en_uncased_L-6_H-512_A-8':\n", " 'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-6_H-512_A-8/1',\n", " 'small_bert/bert_en_uncased_L-6_H-768_A-12':\n", " 'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-6_H-768_A-12/1',\n", " 'small_bert/bert_en_uncased_L-8_H-128_A-2':\n", " 'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-128_A-2/1',\n", " 'small_bert/bert_en_uncased_L-8_H-256_A-4':\n", " 'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-256_A-4/1',\n", " 'small_bert/bert_en_uncased_L-8_H-512_A-8':\n", " 'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-512_A-8/1',\n", " 'small_bert/bert_en_uncased_L-8_H-768_A-12':\n", " 'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-8_H-768_A-12/1',\n", " 'small_bert/bert_en_uncased_L-10_H-128_A-2':\n", " 'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-128_A-2/1',\n", " 'small_bert/bert_en_uncased_L-10_H-256_A-4':\n", " 'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-256_A-4/1',\n", " 'small_bert/bert_en_uncased_L-10_H-512_A-8':\n", " 'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-512_A-8/1',\n", " 'small_bert/bert_en_uncased_L-10_H-768_A-12':\n", " 'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-768_A-12/1',\n", " 'small_bert/bert_en_uncased_L-12_H-128_A-2':\n", " 'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-12_H-128_A-2/1',\n", " 'small_bert/bert_en_uncased_L-12_H-256_A-4':\n", " 'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-12_H-256_A-4/1',\n", " 'small_bert/bert_en_uncased_L-12_H-512_A-8':\n", " 'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-12_H-512_A-8/1',\n", " 'small_bert/bert_en_uncased_L-12_H-768_A-12':\n", " 'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-12_H-768_A-12/1',\n", " 'albert_en_base':\n", " 'https://tfhub.dev/tensorflow/albert_en_base/2',\n", " 'electra_small':\n", " 'https://tfhub.dev/google/electra_small/2',\n", " 'electra_base':\n", " 'https://tfhub.dev/google/electra_base/2',\n", " 'experts_pubmed':\n", " 'https://tfhub.dev/google/experts/bert/pubmed/2',\n", " 'experts_wiki_books':\n", " 'https://tfhub.dev/google/experts/bert/wiki_books/2',\n", " 'talking-heads_base':\n", " 'https://tfhub.dev/tensorflow/talkheads_ggelu_bert_en_base/1',\n", "}\n", "\n", "map_model_to_preprocess = {\n", " 'bert_en_uncased_L-12_H-768_A-12':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", " 'bert_en_cased_L-12_H-768_A-12':\n", " 'https://tfhub.dev/tensorflow/bert_en_cased_preprocess/3',\n", " 'small_bert/bert_en_uncased_L-2_H-128_A-2':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", " 'small_bert/bert_en_uncased_L-2_H-256_A-4':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", " 'small_bert/bert_en_uncased_L-2_H-512_A-8':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", " 'small_bert/bert_en_uncased_L-2_H-768_A-12':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", " 'small_bert/bert_en_uncased_L-4_H-128_A-2':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", " 'small_bert/bert_en_uncased_L-4_H-256_A-4':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", " 'small_bert/bert_en_uncased_L-4_H-512_A-8':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", " 'small_bert/bert_en_uncased_L-4_H-768_A-12':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", " 'small_bert/bert_en_uncased_L-6_H-128_A-2':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", " 'small_bert/bert_en_uncased_L-6_H-256_A-4':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", " 'small_bert/bert_en_uncased_L-6_H-512_A-8':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", " 'small_bert/bert_en_uncased_L-6_H-768_A-12':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", " 'small_bert/bert_en_uncased_L-8_H-128_A-2':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", " 'small_bert/bert_en_uncased_L-8_H-256_A-4':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", " 'small_bert/bert_en_uncased_L-8_H-512_A-8':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", " 'small_bert/bert_en_uncased_L-8_H-768_A-12':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", " 'small_bert/bert_en_uncased_L-10_H-128_A-2':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", " 'small_bert/bert_en_uncased_L-10_H-256_A-4':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", " 'small_bert/bert_en_uncased_L-10_H-512_A-8':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", " 'small_bert/bert_en_uncased_L-10_H-768_A-12':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", " 'small_bert/bert_en_uncased_L-12_H-128_A-2':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", " 'small_bert/bert_en_uncased_L-12_H-256_A-4':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", " 'small_bert/bert_en_uncased_L-12_H-512_A-8':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", " 'small_bert/bert_en_uncased_L-12_H-768_A-12':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", " 'bert_multi_cased_L-12_H-768_A-12':\n", " 'https://tfhub.dev/tensorflow/bert_multi_cased_preprocess/3',\n", " 'albert_en_base':\n", " 'https://tfhub.dev/tensorflow/albert_en_preprocess/3',\n", " 'electra_small':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", " 'electra_base':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", " 'experts_pubmed':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", " 'experts_wiki_books':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", " 'talking-heads_base':\n", " 'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3',\n", "}\n", "\n", "tfhub_handle_encoder = map_name_to_handle[bert_model_name]\n", "tfhub_handle_preprocess = map_model_to_preprocess[bert_model_name]\n", "\n", "print(f'BERT model selected : {tfhub_handle_encoder}')\n", "print(f'Preprocess model auto-selected: {tfhub_handle_preprocess}')" ] }, { "cell_type": "markdown", "metadata": { "id": "pDNKfAXbDnJH" }, "source": [ "## Define your model\n", "\n", "You will create a very simple fine-tuned model, with the preprocessing model, the selected BERT model, one Dense and a Dropout layer." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "aksj743St9ga", "tags": [] }, "outputs": [], "source": [ "def build_classifier_model():\n", " text_input = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text')\n", " preprocessing_layer = hub.KerasLayer(tfhub_handle_preprocess, name='preprocessing')\n", " encoder_inputs = preprocessing_layer(text_input)\n", " encoder = hub.KerasLayer(tfhub_handle_encoder, trainable=True, name='BERT_encoder')\n", " outputs = encoder(encoder_inputs)\n", " net = outputs['pooled_output']\n", " net = tf.keras.layers.Dropout(0.1)(net)\n", " net = tf.keras.layers.Dense(1, activation=None, name='classifier')(net)\n", " return tf.keras.Model(text_input, net)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "text_test = ['this is such an amazing movie!']\n", "classifier_model = build_classifier_model()\n", "bert_raw_result = classifier_model(tf.constant(text_test))\n", "print(tf.sigmoid(bert_raw_result))" ] }, { "cell_type": "markdown", "metadata": { "id": "ZTUzNV2JE2G3" }, "source": [ "Let's take a look at the model's structure." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 466 }, "id": "0EmzyHZXKIpm", "outputId": "cba41030-b465-43fd-8f95-ddb1f3fb53b7", "tags": [] }, "outputs": [], "source": [ "tf.keras.utils.plot_model(classifier_model)" ] }, { "cell_type": "markdown", "metadata": { "id": "WbUWoZMwc302" }, "source": [ "## Model training\n", "\n", "You now have all the pieces to train a model, including the preprocessing module, BERT encoder, data, and classifier." ] }, { "cell_type": "markdown", "metadata": { "id": "WpJ3xcwDT56v" }, "source": [ "### Loss function\n", "\n", "Since this is a binary classification problem and the model outputs a probability (a single-unit layer), you'll use `losses.BinaryCrossentropy` loss function.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "OWPOZE-L3AgE", "tags": [] }, "outputs": [], "source": [ "loss = tf.keras.losses.BinaryCrossentropy(from_logits=True)\n", "metrics = tf.metrics.BinaryAccuracy()" ] }, { "cell_type": "markdown", "metadata": { "id": "77psrpfzbxtp" }, "source": [ "### Optimizer\n", "\n", "For fine-tuning, let's use the same optimizer that BERT was originally trained with: the \"Adaptive Moments\" (Adam). This optimizer minimizes the prediction loss and does regularization by weight decay (not using moments), which is also known as [AdamW](https://arxiv.org/abs/1711.05101).\n", "\n", "For the learning rate (`init_lr`), you will use the same schedule as BERT pre-training: linear decay of a notional initial learning rate, prefixed with a linear warm-up phase over the first 10% of training steps (`num_warmup_steps`). In line with the BERT paper, the initial learning rate is smaller for fine-tuning (best of 5e-5, 3e-5, 2e-5)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "P9eP2y9dbw32", "tags": [] }, "outputs": [], "source": [ "epochs = 5\n", "steps_per_epoch = len(train_ds)\n", "num_train_steps = steps_per_epoch * epochs\n", "num_warmup_steps = int(0.1*num_train_steps)\n", "\n", "init_lr = 3e-5\n", "optimizer = optimization.create_optimizer(init_lr=init_lr,\n", " num_train_steps=num_train_steps,\n", " num_warmup_steps=num_warmup_steps,\n", " optimizer_type='adamw')" ] }, { "cell_type": "markdown", "metadata": { "id": "SqlarlpC_v0g" }, "source": [ "### Loading the BERT model and training\n", "\n", "Using the `classifier_model` you created earlier, you can compile the model with the loss, metric and optimizer." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "-7GPDhR98jsD", "tags": [] }, "outputs": [], "source": [ "classifier_model.compile(optimizer=optimizer,\n", " loss=loss,\n", " metrics=metrics)" ] }, { "cell_type": "markdown", "metadata": { "id": "CpBuV5j2cS_b" }, "source": [ "Note: training time will vary depending on the complexity of the BERT model you have selected." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "HtfDFAnN_Neu", "outputId": "ea39065c-7352-42c8-a0a3-3cdf121318eb", "tags": [] }, "outputs": [], "source": [ "print(f'Training model with {tfhub_handle_encoder}')\n", "history = classifier_model.fit(x=train_ds,\n", " y=train_labels,\n", " validation_split=0.2,\n", " epochs=epochs)" ] }, { "cell_type": "markdown", "metadata": { "id": "uBthMlTSV8kn" }, "source": [ "### Evaluate the model\n", "\n", "Let's see how the model performs. Two values will be returned. Loss (a number which represents the error, lower values are better), and accuracy." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "slqB-urBV9sP", "tags": [] }, "outputs": [], "source": [ "loss, accuracy = classifier_model.evaluate(test_ds, test_labels)\n", "print(f'Loss: {loss}')\n", "print(f'Accuracy: {accuracy}')" ] }, { "cell_type": "markdown", "metadata": { "id": "uttWpgmSfzq9" }, "source": [ "### Plot the accuracy and loss over time\n", "\n", "Based on the `History` object returned by `model.fit()`. You can plot the training and validation loss for comparison, as well as the training and validation accuracy:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "fiythcODf0xo", "tags": [] }, "outputs": [], "source": [ "history_dict = history.history\n", "print(history_dict.keys())\n", "\n", "acc = history_dict['binary_accuracy']\n", "val_acc = history_dict['val_binary_accuracy']\n", "loss = history_dict['loss']\n", "val_loss = history_dict['val_loss']\n", "\n", "epochs = range(1, len(acc) + 1)\n", "fig = plt.figure(figsize=(10, 6))\n", "fig.tight_layout()\n", "\n", "plt.subplot(2, 1, 1)\n", "# r is for \"solid red line\"\n", "plt.plot(epochs, loss, 'r', label='Training loss')\n", "# b is for \"solid blue line\"\n", "plt.plot(epochs, val_loss, 'b', label='Validation loss')\n", "plt.title('Training and validation loss')\n", "# plt.xlabel('Epochs')\n", "plt.ylabel('Loss')\n", "plt.legend()\n", "\n", "plt.subplot(2, 1, 2)\n", "plt.plot(epochs, acc, 'r', label='Training acc')\n", "plt.plot(epochs, val_acc, 'b', label='Validation acc')\n", "plt.title('Training and validation accuracy')\n", "plt.xlabel('Epochs')\n", "plt.ylabel('Accuracy')\n", "plt.legend(loc='lower right')" ] }, { "cell_type": "markdown", "metadata": { "id": "WzJZCo-cf-Jf" }, "source": [ "In this plot, the red lines represent the training loss and accuracy, and the blue lines are the validation loss and accuracy." ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "### Confusion Matrix, precision and recall" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "predictions = classifier_model.predict(test_ds)\n", "\n", "# Convert predictions to binary values (0 or 1)\n", "binary_predictions = np.round(tf.sigmoid(predictions)).astype(int)\n", "# Calculate the confusion matrix\n", "confusion_matrix = tf.math.confusion_matrix(test_labels, binary_predictions)\n", "import seaborn as sns\n", "\n", "# Print the confusion matrix\n", "sns.heatmap(confusion_matrix, annot=True, fmt='d', cmap='Blues')\n", "\n", "# Add labels and title\n", "plt.xlabel('Predicted')\n", "plt.ylabel('True')\n", "plt.title('Confusion Matrix')\n", "\n", "# Show the plot\n", "plt.show()\n", "tp = confusion_matrix[1, 1].numpy().item()\n", "fp = confusion_matrix[0, 1].numpy().item()\n", "fn = confusion_matrix[1, 0].numpy().item()\n", "\n", "# Calculate precision and recall\n", "precision = tp / (tp + fp)\n", "recall = tp / (tp + fn)\n", "\n", "# Print precision and recall\n", "print(\"Precision:\", precision)\n", "print(\"Recall:\", recall)" ] }, { "cell_type": "markdown", "metadata": { "id": "Rtn7jewb6dg4" }, "source": [ "## Export for inference\n", "\n", "Now you just save your fine-tuned model for later use." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ShcvqJAgVera", "tags": [] }, "outputs": [], "source": [ "model_name = 'sentiments_bert_model.h5'\n", "saved_model_path = '../models/{}'.format(model_name.replace('/', '_'))\n", "\n", "classifier_model.save(saved_model_path, include_optimizer=False)" ] }, { "cell_type": "markdown", "metadata": { "id": "PbI25bS1vD7s" }, "source": [ "Let's reload the model, so you can try it side by side with the model that is still in memory." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "gUEWVskZjEF0", "tags": [] }, "outputs": [], "source": [ "reloaded_model = tf.keras.models.load_model(saved_model_path, custom_objects={'KerasLayer':hub.KerasLayer})" ] }, { "cell_type": "markdown", "metadata": { "id": "oyTappHTvNCz" }, "source": [ "Here you can test your model on any sentence you want, just add to the examples variable below." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "VBWzH6exlCPS", "tags": [] }, "outputs": [], "source": [ "def print_my_examples(inputs, results):\n", " result_for_printing = \\\n", " [f'input: {inputs[i]:<30} : score: {results[i][0]:.6f}'\n", " for i in range(len(inputs))]\n", " print(*result_for_printing, sep='\\n')\n", " print()\n", "\n", "\n", "\n", "examples = [\n", " \"I like the movie at first but then it was shit\",\n", " \"The product is quite good\",\n", " \"I have mixed feelings but i thing it is pretty well overall\",\n", " \"At the beggining it felt good but from the 30 minute ahead i hated it\",\n", " \"Such a bag of crap\",\n", " \"This is useless\"\n", "]\n", "\n", "reloaded_results = tf.sigmoid(reloaded_model(tf.constant(examples)))\n", "original_results = tf.sigmoid(classifier_model(tf.constant(examples)))\n", "\n", "print('Results from the saved model:')\n", "print_my_examples(examples, reloaded_results)\n", "print('Results from the model in memory:')\n", "print_my_examples(examples, original_results)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "colab": { "name": "classify_text_with_bert.ipynb", "provenance": [], "toc_visible": true }, "kernelspec": { "display_name": "venv-rootstrap", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.9" } }, "nbformat": 4, "nbformat_minor": 4 }