|
<! |
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with |
|
the License. You may obtain a copy of the License at |
|
|
|
http://www.apache.org/licenses/LICENSE-2.0 |
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on |
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the |
|
specific language governing permissions and limitations under the License. |
|
|
|
|
|
|
|
|
|
This guide demonstrates how to use LoRA, a low-rank approximation technique, to fine-tune an image classification model. |
|
By using LoRA from 🤗 PEFT, we can reduce the number of trainable parameters in the model to only 0.77% of the original. |
|
|
|
LoRA achieves this reduction by adding low-rank "update matrices" to specific blocks of the model, such as the attention |
|
blocks. During fine-tuning, only these matrices are trained, while the original model parameters are left unchanged. |
|
At inference time, the update matrices are merged with the original model parameters to produce the final classification result. |
|
|
|
For more information on LoRA, please refer to the [original LoRA paper](https://arxiv.org/abs/2106.09685). |
|
|
|
|
|
|
|
Install the libraries required for model training: |
|
|
|
```bash |
|
!pip install transformers accelerate evaluate datasets loralib peft -q |
|
``` |
|
|
|
Check the versions of all required libraries to make sure you are up to date: |
|
|
|
```python |
|
import transformers |
|
import accelerate |
|
import peft |
|
|
|
print(f"Transformers version: {transformers.__version__}") |
|
print(f"Accelerate version: {accelerate.__version__}") |
|
print(f"PEFT version: {peft.__version__}") |
|
"Transformers version: 4.27.4" |
|
"Accelerate version: 0.18.0" |
|
"PEFT version: 0.2.0" |
|
``` |
|
|
|
|
|
|
|
To share the fine-tuned model at the end of the training with the community, authenticate using your 🤗 token. |
|
You can obtain your token from your [account settings](https://huggingface.co/settings/token). |
|
|
|
```python |
|
from huggingface_hub import notebook_login |
|
|
|
notebook_login() |
|
``` |
|
|
|
|
|
|
|
Choose a model checkpoint from any of the model architectures supported for [image classification](https://huggingface.co/models?pipeline_tag=image-classification&sort=downloads). When in doubt, refer to |
|
the [image classification task guide](https://huggingface.co/docs/transformers/v4.27.2/en/tasks/image_classification) in |
|
🤗 Transformers documentation. |
|
|
|
```python |
|
model_checkpoint = "google/vit-base-patch16-224-in21k" |
|
``` |
|
|
|
|
|
|
|
To keep this example's runtime short, let's only load the first 5000 instances from the training set of the [Food-101 dataset](https://huggingface.co/datasets/food101): |
|
|
|
```python |
|
from datasets import load_dataset |
|
|
|
dataset = load_dataset("food101", split="train[:5000]") |
|
``` |
|
|
|
|
|
|
|
To prepare the dataset for training and evaluation, create `label2id` and `id2label` dictionaries. These will come in |
|
handy when performing inference and for metadata information: |
|
|
|
```python |
|
labels = dataset.features["label"].names |
|
label2id, id2label = dict(), dict() |
|
for i, label in enumerate(labels): |
|
label2id[label] = i |
|
id2label[i] = label |
|
|
|
id2label[2] |
|
"baklava" |
|
``` |
|
|
|
Next, load the image processor of the model you're fine-tuning: |
|
|
|
```python |
|
from transformers import AutoImageProcessor |
|
|
|
image_processor = AutoImageProcessor.from_pretrained(model_checkpoint) |
|
``` |
|
|
|
The `image_processor` contains useful information on which size the training and evaluation images should be resized |
|
to, as well as values that should be used to normalize the pixel values. Using the `image_processor`, prepare transformation |
|
functions for the datasets. These functions will include data augmentation and pixel scaling: |
|
|
|
```python |
|
from torchvision.transforms import ( |
|
CenterCrop, |
|
Compose, |
|
Normalize, |
|
RandomHorizontalFlip, |
|
RandomResizedCrop, |
|
Resize, |
|
ToTensor, |
|
) |
|
|
|
normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std) |
|
train_transforms = Compose( |
|
[ |
|
RandomResizedCrop(image_processor.size["height"]), |
|
RandomHorizontalFlip(), |
|
ToTensor(), |
|
normalize, |
|
] |
|
) |
|
|
|
val_transforms = Compose( |
|
[ |
|
Resize(image_processor.size["height"]), |
|
CenterCrop(image_processor.size["height"]), |
|
ToTensor(), |
|
normalize, |
|
] |
|
) |
|
|
|
|
|
def preprocess_train(example_batch): |
|
"""Apply train_transforms across a batch.""" |
|
example_batch["pixel_values"] = [train_transforms(image.convert("RGB")) for image in example_batch["image"]] |
|
return example_batch |
|
|
|
|
|
def preprocess_val(example_batch): |
|
"""Apply val_transforms across a batch.""" |
|
example_batch["pixel_values"] = [val_transforms(image.convert("RGB")) for image in example_batch["image"]] |
|
return example_batch |
|
``` |
|
|
|
Split the dataset into training and validation sets: |
|
|
|
```python |
|
splits = dataset.train_test_split(test_size=0.1) |
|
train_ds = splits["train"] |
|
val_ds = splits["test"] |
|
``` |
|
|
|
Finally, set the transformation functions for the datasets accordingly: |
|
|
|
```python |
|
train_ds.set_transform(preprocess_train) |
|
val_ds.set_transform(preprocess_val) |
|
``` |
|
|
|
## Load and prepare a model |
|
|
|
Before loading the model, let's define a helper function to check the total number of parameters a model has, as well |
|
as how many of them are trainable. |
|
|
|
```python |
|
def print_trainable_parameters(model): |
|
trainable_params = 0 |
|
all_param = 0 |
|
for _, param in model.named_parameters(): |
|
all_param += param.numel() |
|
if param.requires_grad: |
|
trainable_params += param.numel() |
|
print( |
|
f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param:.2f}" |
|
) |
|
``` |
|
|
|
It's important to initialize the original model correctly as it will be used as a base to create the `PeftModel` you'll |
|
actually fine-tune. Specify the `label2id` and `id2label` so that [`~transformers.AutoModelForImageClassification`] can append a classification |
|
head to the underlying model, adapted for this dataset. You should see the following output: |
|
|
|
``` |
|
Some weights of ViTForImageClassification were not initialized from the model checkpoint at google/vit-base-patch16-224-in21k and are newly initialized: ['classifier.weight', 'classifier.bias'] |
|
``` |
|
|
|
```python |
|
from transformers import AutoModelForImageClassification, TrainingArguments, Trainer |
|
|
|
model = AutoModelForImageClassification.from_pretrained( |
|
model_checkpoint, |
|
label2id=label2id, |
|
id2label=id2label, |
|
ignore_mismatched_sizes=True, |
|
) |
|
``` |
|
|
|
Before creating a `PeftModel`, you can check the number of trainable parameters in the original model: |
|
|
|
```python |
|
print_trainable_parameters(model) |
|
"trainable params: 85876325 || all params: 85876325 || trainable%: 100.00" |
|
``` |
|
|
|
Next, use `get_peft_model` to wrap the base model so that "update" matrices are added to the respective places. |
|
|
|
```python |
|
from peft import LoraConfig, get_peft_model |
|
|
|
config = LoraConfig( |
|
r=16, |
|
lora_alpha=16, |
|
target_modules=["query", "value"], |
|
lora_dropout=0.1, |
|
bias="none", |
|
modules_to_save=["classifier"], |
|
) |
|
lora_model = get_peft_model(model, config) |
|
print_trainable_parameters(lora_model) |
|
"trainable params: 667493 || all params: 86466149 || trainable%: 0.77" |
|
``` |
|
|
|
Let's unpack what's going on here. |
|
To use LoRA, you need to specify the target modules in `LoraConfig` so that `get_peft_model()` knows which modules |
|
inside our model need to be amended with LoRA matrices. In this example, we're only interested in targeting the query and |
|
value matrices of the attention blocks of the base model. Since the parameters corresponding to these matrices are "named" |
|
"query" and "value" respectively, we specify them accordingly in the `target_modules` argument of `LoraConfig`. |
|
|
|
We also specify `modules_to_save`. After wrapping the base model with `get_peft_model()` along with the `config`, we get |
|
a new model where only the LoRA parameters are trainable (so-called "update matrices") while the pre-trained parameters |
|
are kept frozen. However, we want the classifier parameters to be trained too when fine-tuning the base model on our |
|
custom dataset. To ensure that the classifier parameters are also trained, we specify `modules_to_save`. This also |
|
ensures that these modules are serialized alongside the LoRA trainable parameters when using utilities like `save_pretrained()` |
|
and `push_to_hub()`. |
|
|
|
Here's what the other parameters mean: |
|
|
|
- `r`: The dimension used by the LoRA update matrices. |
|
- `alpha`: Scaling factor. |
|
- `bias`: Specifies if the `bias` parameters should be trained. `None` denotes none of the `bias` parameters will be trained. |
|
|
|
`r` and `alpha` together control the total number of final trainable parameters when using LoRA, giving you the flexibility |
|
to balance a trade-off between end performance and compute efficiency. |
|
|
|
By looking at the number of trainable parameters, you can see how many parameters we're actually training. Since the goal is |
|
to achieve parameter-efficient fine-tuning, you should expect to see fewer trainable parameters in the `lora_model` |
|
in comparison to the original model, which is indeed the case here. |
|
|
|
## Define training arguments |
|
|
|
For model fine-tuning, use [`~transformers.Trainer`]. It accepts |
|
several arguments which you can wrap using [`~transformers.TrainingArguments`]. |
|
|
|
```python |
|
from transformers import TrainingArguments, Trainer |
|
|
|
|
|
model_name = model_checkpoint.split("/")[-1] |
|
batch_size = 128 |
|
|
|
args = TrainingArguments( |
|
f"{model_name}-finetuned-lora-food101", |
|
remove_unused_columns=False, |
|
evaluation_strategy="epoch", |
|
save_strategy="epoch", |
|
learning_rate=5e-3, |
|
per_device_train_batch_size=batch_size, |
|
gradient_accumulation_steps=4, |
|
per_device_eval_batch_size=batch_size, |
|
fp16=True, |
|
num_train_epochs=5, |
|
logging_steps=10, |
|
load_best_model_at_end=True, |
|
metric_for_best_model="accuracy", |
|
push_to_hub=True, |
|
label_names=["labels"], |
|
) |
|
``` |
|
|
|
Compared to non-PEFT methods, you can use a larger batch size since there are fewer parameters to train. |
|
You can also set a larger learning rate than the normal (1e-5 for example). |
|
|
|
This can potentially also reduce the need to conduct expensive hyperparameter tuning experiments. |
|
|
|
## Prepare evaluation metric |
|
|
|
```python |
|
import numpy as np |
|
import evaluate |
|
|
|
metric = evaluate.load("accuracy") |
|
|
|
|
|
def compute_metrics(eval_pred): |
|
"""Computes accuracy on a batch of predictions""" |
|
predictions = np.argmax(eval_pred.predictions, axis=1) |
|
return metric.compute(predictions=predictions, references=eval_pred.label_ids) |
|
``` |
|
|
|
The `compute_metrics` function takes a named tuple as input: `predictions`, which are the logits of the model as Numpy arrays, |
|
and `label_ids`, which are the ground-truth labels as Numpy arrays. |
|
|
|
## Define collation function |
|
|
|
A collation function is used by [`~transformers.Trainer`] to gather a batch of training and evaluation examples and prepare them in a |
|
format that is acceptable by the underlying model. |
|
|
|
```python |
|
import torch |
|
|
|
|
|
def collate_fn(examples): |
|
pixel_values = torch.stack([example["pixel_values"] for example in examples]) |
|
labels = torch.tensor([example["label"] for example in examples]) |
|
return {"pixel_values": pixel_values, "labels": labels} |
|
``` |
|
|
|
## Train and evaluate |
|
|
|
Bring everything together - model, training arguments, data, collation function, etc. Then, start the training! |
|
|
|
```python |
|
trainer = Trainer( |
|
model, |
|
args, |
|
train_dataset=train_ds, |
|
eval_dataset=val_ds, |
|
tokenizer=image_processor, |
|
compute_metrics=compute_metrics, |
|
data_collator=collate_fn, |
|
) |
|
train_results = trainer.train() |
|
``` |
|
|
|
In just a few minutes, the fine-tuned model shows 96% validation accuracy even on this small |
|
subset of the training dataset. |
|
|
|
```python |
|
trainer.evaluate(val_ds) |
|
{ |
|
"eval_loss": 0.14475855231285095, |
|
"eval_accuracy": 0.96, |
|
"eval_runtime": 3.5725, |
|
"eval_samples_per_second": 139.958, |
|
"eval_steps_per_second": 1.12, |
|
"epoch": 5.0, |
|
} |
|
``` |
|
|
|
## Share your model and run inference |
|
|
|
Once the fine-tuning is done, share the LoRA parameters with the community like so: |
|
|
|
```python |
|
repo_name = f"sayakpaul/{model_name}-finetuned-lora-food101" |
|
lora_model.push_to_hub(repo_name) |
|
``` |
|
|
|
When calling [`~transformers.PreTrainedModel.push_to_hub`] on the `lora_model`, only the LoRA parameters along with any modules specified in `modules_to_save` |
|
are saved. Take a look at the [trained LoRA parameters](https://huggingface.co/sayakpaul/vit-base-patch16-224-in21k-finetuned-lora-food101/blob/main/adapter_model.bin). |
|
You'll see that it's only 2.6 MB! This greatly helps with portability, especially when using a very large model to fine-tune (such as [BLOOM](https://huggingface.co/bigscience/bloom)). |
|
|
|
Next, let's see how to load the LoRA updated parameters along with our base model for inference. When you wrap a base model |
|
with `PeftModel`, modifications are done *in-place*. To mitigate any concerns that might stem from in-place modifications, |
|
initialize the base model just like you did earlier and construct the inference model. |
|
|
|
```python |
|
from peft import PeftConfig, PeftModel |
|
|
|
|
|
config = PeftConfig.from_pretrained(repo_name) |
|
model = AutoModelForImageClassification.from_pretrained( |
|
config.base_model_name_or_path, |
|
label2id=label2id, |
|
id2label=id2label, |
|
ignore_mismatched_sizes=True, |
|
) |
|
|
|
inference_model = PeftModel.from_pretrained(model, repo_name) |
|
``` |
|
|
|
Let's now fetch an example image for inference. |
|
|
|
```python |
|
from PIL import Image |
|
import requests |
|
|
|
url = "https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/beignets.jpeg" |
|
image = Image.open(requests.get(url, stream=True).raw) |
|
image |
|
``` |
|
|
|
<div class="flex justify-center"> |
|
<img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/beignets.jpeg" alt="image of beignets"/> |
|
</div> |
|
|
|
First, instantiate an `image_processor` from the underlying model repo. |
|
|
|
```python |
|
image_processor = AutoImageProcessor.from_pretrained(repo_name) |
|
``` |
|
|
|
Then, prepare the example for inference. |
|
|
|
```python |
|
encoding = image_processor(image.convert("RGB"), return_tensors="pt") |
|
``` |
|
|
|
Finally, run inference! |
|
|
|
```python |
|
with torch.no_grad(): |
|
outputs = inference_model(**encoding) |
|
logits = outputs.logits |
|
|
|
predicted_class_idx = logits.argmax(-1).item() |
|
print("Predicted class:", inference_model.config.id2label[predicted_class_idx]) |
|
"Predicted class: beignets" |
|
``` |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|