depth-estimator / README.md

Update README.md

bc1b969 verified 6 months ago

3.96 kB

	---
	license: apache-2.0
	datasets:
	- 0jl/NYUv2
	language:
	- en
	metrics:
	- r2
	- mae
	- mse
	pipeline_tag: depth-estimation
	tags:
	- xgboost
	- python
	- depth-estimation
	- resnet50
	---

	# Depth Estimation Using ResNet50 and XGBoost
	## Author
	- Vishal Adithya.A
	## Overview
	This project demonstrates a depth estimation XgBoost Regressor model that predicts the average depth of images provided using features extracted from a pre-trained ResNet50 model.The model was trained upon the NYUv2 dataset ([0jl/NYUv2](https://huggingface.co/datasets/0jl/NYUv2)). The trained model is saved using Python's `pickle` library for easy deployment and reuse.

	### Loading the Model
	The model is saved as `model.pkl` using `pickle`. You can load and use it as follows:

	```python
	with open("model.pkl", "rb") as f:
	model = pickle.load(f)

	features = extract_features("path/to/image.jpg")
	predicted_depth = model.predict([features])
	print(predicted_depth[0])
	```
	NOTE: extract_features() is a predefined function in the original code which uses ResNet50 to extract features out of the image.

	## Key Features
	- Model Architecture:
	- Feature extraction: ResNet50 (pre-trained on ImageNet, with the top layers removed and global average pooling).
	- Regression: XGBoost, optimized for structured data prediction.
	- Training GPU: NVIDIA RTX 4060 Ti, ensuring efficient computation.
	- Target: Predict the average depth of images based on the depth maps from the dataset.

	## Dataset
	- Dataset: NYUv2 ([0jl/NYUv2](https://huggingface.co/datasets/0jl/NYUv2))
	- Format: The dataset includes RGB images and corresponding depth maps.
	- Preprocessing:
	- Images were resized to 224x224 pixels to match the input requirements of ResNet50.
	- Depth maps were converted into single average depth values.

	## Model Training
	1. Feature Extraction:
	- ResNet50 was used to extract a fixed-length feature vector from each image.
	- Preprocessing: Images were normalized using the `preprocess_input` function from TensorFlow's ResNet50 module.
	2. Regression:
	- XGBoost regressor was trained on the extracted features to predict average depth values.
	- Hyperparameters were tuned using cross-validation techniques for optimal performance.

	## Results
	- R² Score: 0.841
	- Performance is reasonable for a first few implementation and can be further improved with additional tuning or by improving feature extraction methods.

	## How to Use
	### Requirements
	1. Python 3.10+
	2. Required libraries:
	- `numpy`
	- `pickle`
	- `xgboost`
	- `datasets`
	- `tensorflow`
	- `scikitlearn`

	Install the dependencies using pip:
	```bash
	pip install numpy tensorflow xgboost datasets scikit-learn
	```

	### Training Pipeline
	If you want to retrain the model, follow these steps:

	1. Download the NYUv2 dataset from Hugging Face:
	```python
	from datasets import load_dataset
	dataset = load_dataset("0jl/NYUv2")
	```
	2. Extract features using ResNet50:
	```python

	model = ResNet50(weights="imagenet", include_top=False, pooling="avg")

	from PIL import Image
	def extract_features(image_path):
	image_array = preprocess_input(image_array)
	features = model.predict(image_array)
	return features.flatten()
	```
	3. Train the XGBoost regressor on the extracted features and save the model:
	```python

	regressor = XGBRegressor()
	regressor.fit(X_train, y_train)

	with open("model.pkl", "wb") as f:
	pickle.dump(regressor, f)
	```
	NOTE: This pipeline has just the base fundamental code more additional parameter tunings and preprocessing steps were being conducted during the training of the original model.


	## License
	This project is licensed under the Apache License 2.0.

	## Acknowledgments
	- Hugging Face for hosting the NYUv2 dataset.
	- NVIDIA RTX 4060 Ti for providing efficient GPU acceleration.
	- TensorFlow and XGBoost for robust machine learning frameworks.