gullalc
/

convnextv2-base-22k-224-cinescale-level

Image Classification

Inference Endpoints

Model card Files Files and versions Community

convnextv2-base-22k-224-cinescale-level / README.md

gullalc's picture

Update README.md

a4a4902 verified 11 months ago

|

history blame contribute delete

2.69 kB

	---
	license: mit
	library_name: transformers
	tags:
	- camera level
	- camera feature
	- movie analysis
	metrics:
	- accuracy
	- f1
	pipeline_tag: image-classification
	---

	# Convnextv2 finetuned for level classification

	Convnextv2 base-size model finetuned for the classification of camera angles. [Cinescale](https://cinescale.github.io/camera_al/#dataset) dataset is used to finetune the model for 20 epochs.

	Classifies an image into six classes: aerial, eye, ground, hip, knee, shoulder

	## Evaluation

	On the test set (test.csv), the model has an accuracy of 90.20% and macro-f1 of 82.28%

	## How to use

	```python
	from transformers import AutoModelForImageClassification
	import torch
	from torchvision.transforms import v2
	from torchvision.io import read_image, ImageReadMode

	model = AutoModelForImageClassification.from_pretrained("gullalc/convnextv2-base-22k-224-cinescale-level")
	im_size = 224

	# https://www.pexels.com/photo/aerial-view-of-city-buildings-8783146/
	image = read_image("demo/level_demo.jpg", mode=ImageReadMode.RGB)

	transform = v2.Compose([v2.Resize((im_size,im_size), antialias=True),
	v2.ToDtype(torch.float32, scale=True),
	v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])

	inputs = transform(image).unsqueeze(0)

	with torch.no_grad():
	outputs = model(pixel_values=inputs)


	predicted_label = model.config.id2label[torch.argmax(outputs.logits).item()]
	print(predicted_label)
	# --> aerial
	```

	## Training Details
	```python
	## Training transforms
	randomorder = v2.RandomOrder([
	v2.RandomHorizontalFlip(),
	v2.GaussianBlur(5),
	v2.RandomAdjustSharpness(2),
	v2.RandomGrayscale(p=0.2),
	v2.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1)])

	train_transform = v2.Compose([v2.Resize((im_size,im_size), antialias=True),
	randomorder,
	v2.ToDtype(torch.float32, scale=True),
	v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])

	## Training Arguments
	training_args = TrainingArguments(
	evaluation_strategy = "epoch",
	save_strategy = "epoch",
	learning_rate=5e-5,
	per_device_train_batch_size=128,
	gradient_accumulation_steps=4,
	per_device_eval_batch_size=128,
	num_train_epochs=30,
	warmup_ratio=0.1,
	logging_steps=10,
	load_best_model_at_end=True,
	metric_for_best_model="f1",
	dataloader_num_workers=32,
	torch_compile=True
	)
	```