|
--- |
|
license: mit |
|
library_name: transformers |
|
tags: |
|
- camera level |
|
- camera feature |
|
- movie analysis |
|
metrics: |
|
- accuracy |
|
- f1 |
|
pipeline_tag: image-classification |
|
--- |
|
|
|
# Convnextv2 finetuned for level classification |
|
|
|
Convnextv2 base-size model finetuned for the classification of camera angles. [Cinescale](https://cinescale.github.io/camera_al/#dataset) dataset is used to finetune the model for 20 epochs. |
|
|
|
Classifies an image into six classes: *aerial, eye, ground, hip, knee, shoulder* |
|
|
|
## Evaluation |
|
|
|
On the test set (test.csv), the model has an accuracy of 90.20% and macro-f1 of 82.28% |
|
|
|
## How to use |
|
|
|
```python |
|
from transformers import AutoModelForImageClassification |
|
import torch |
|
from torchvision.transforms import v2 |
|
from torchvision.io import read_image, ImageReadMode |
|
|
|
model = AutoModelForImageClassification.from_pretrained("gullalc/convnextv2-base-22k-224-cinescale-level") |
|
im_size = 224 |
|
|
|
# https://www.pexels.com/photo/aerial-view-of-city-buildings-8783146/ |
|
image = read_image("demo/level_demo.jpg", mode=ImageReadMode.RGB) |
|
|
|
transform = v2.Compose([v2.Resize((im_size,im_size), antialias=True), |
|
v2.ToDtype(torch.float32, scale=True), |
|
v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])]) |
|
|
|
inputs = transform(image).unsqueeze(0) |
|
|
|
with torch.no_grad(): |
|
outputs = model(pixel_values=inputs) |
|
|
|
|
|
predicted_label = model.config.id2label[torch.argmax(outputs.logits).item()] |
|
print(predicted_label) |
|
# --> aerial |
|
``` |
|
|
|
## Training Details |
|
```python |
|
## Training transforms |
|
randomorder = v2.RandomOrder([ |
|
v2.RandomHorizontalFlip(), |
|
v2.GaussianBlur(5), |
|
v2.RandomAdjustSharpness(2), |
|
v2.RandomGrayscale(p=0.2), |
|
v2.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1)]) |
|
|
|
train_transform = v2.Compose([v2.Resize((im_size,im_size), antialias=True), |
|
randomorder, |
|
v2.ToDtype(torch.float32, scale=True), |
|
v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])]) |
|
|
|
## Training Arguments |
|
training_args = TrainingArguments( |
|
evaluation_strategy = "epoch", |
|
save_strategy = "epoch", |
|
learning_rate=5e-5, |
|
per_device_train_batch_size=128, |
|
gradient_accumulation_steps=4, |
|
per_device_eval_batch_size=128, |
|
num_train_epochs=30, |
|
warmup_ratio=0.1, |
|
logging_steps=10, |
|
load_best_model_at_end=True, |
|
metric_for_best_model="f1", |
|
dataloader_num_workers=32, |
|
torch_compile=True |
|
) |
|
``` |