depth-estimator / README.md
vishal-adithya's picture
Update README.md
bc1b969 verified
---
license: apache-2.0
datasets:
- 0jl/NYUv2
language:
- en
metrics:
- r2
- mae
- mse
pipeline_tag: depth-estimation
tags:
- xgboost
- python
- depth-estimation
- resnet50
---
# Depth Estimation Using ResNet50 and XGBoost
## Author
- **Vishal Adithya.A**
## Overview
This project demonstrates a depth estimation XgBoost Regressor model that predicts the average depth of images provided using features extracted from a pre-trained ResNet50 model.The model was trained upon the **NYUv2 dataset** ([0jl/NYUv2](https://huggingface.co/datasets/0jl/NYUv2)). The trained model is saved using Python's `pickle` library for easy deployment and reuse.
### Loading the Model
The model is saved as `model.pkl` using `pickle`. You can load and use it as follows:
```python
with open("model.pkl", "rb") as f:
model = pickle.load(f)
features = extract_features("path/to/image.jpg")
predicted_depth = model.predict([features])
print(predicted_depth[0])
```
**NOTE:** extract_features() is a predefined function in the original code which uses ResNet50 to extract features out of the image.
## Key Features
- **Model Architecture**:
- Feature extraction: ResNet50 (pre-trained on ImageNet, with the top layers removed and global average pooling).
- Regression: XGBoost, optimized for structured data prediction.
- **Training GPU**: NVIDIA RTX 4060 Ti, ensuring efficient computation.
- **Target**: Predict the average depth of images based on the depth maps from the dataset.
## Dataset
- Dataset: **NYUv2** ([0jl/NYUv2](https://huggingface.co/datasets/0jl/NYUv2))
- Format: The dataset includes RGB images and corresponding depth maps.
- Preprocessing:
- Images were resized to 224x224 pixels to match the input requirements of ResNet50.
- Depth maps were converted into single average depth values.
## Model Training
1. **Feature Extraction**:
- ResNet50 was used to extract a fixed-length feature vector from each image.
- Preprocessing: Images were normalized using the `preprocess_input` function from TensorFlow's ResNet50 module.
2. **Regression**:
- XGBoost regressor was trained on the extracted features to predict average depth values.
- Hyperparameters were tuned using cross-validation techniques for optimal performance.
## Results
- **R² Score**: 0.841
- Performance is reasonable for a first few implementation and can be further improved with additional tuning or by improving feature extraction methods.
## How to Use
### Requirements
1. Python 3.10+
2. Required libraries:
- `numpy`
- `pickle`
- `xgboost`
- `datasets`
- `tensorflow`
- `scikitlearn`
Install the dependencies using pip:
```bash
pip install numpy tensorflow xgboost datasets scikit-learn
```
### Training Pipeline
If you want to retrain the model, follow these steps:
1. Download the **NYUv2 dataset** from Hugging Face:
```python
from datasets import load_dataset
dataset = load_dataset("0jl/NYUv2")
```
2. Extract features using ResNet50:
```python
model = ResNet50(weights="imagenet", include_top=False, pooling="avg")
from PIL import Image
def extract_features(image_path):
image_array = preprocess_input(image_array)
features = model.predict(image_array)
return features.flatten()
```
3. Train the XGBoost regressor on the extracted features and save the model:
```python
regressor = XGBRegressor()
regressor.fit(X_train, y_train)
with open("model.pkl", "wb") as f:
pickle.dump(regressor, f)
```
**NOTE:** This pipeline has just the base fundamental code more additional parameter tunings and preprocessing steps were being conducted during the training of the original model.
## License
This project is licensed under the Apache License 2.0.
## Acknowledgments
- Hugging Face for hosting the NYUv2 dataset.
- NVIDIA RTX 4060 Ti for providing efficient GPU acceleration.
- TensorFlow and XGBoost for robust machine learning frameworks.