---
license: apache-2.0
datasets:
  - 0jl/NYUv2
language:
  - en
metrics:
  - r2
  - mae
  - mse
pipeline_tag: depth-estimation
tags:
  - xgboost
  - python
  - depth-estimation
  - resnet50
---

# Depth Estimation Using ResNet50 and XGBoost
## Author
 - **Vishal Adithya.A**
## Overview
This project demonstrates a depth estimation XgBoost Regressor model that predicts the average depth of images provided using features extracted from a pre-trained ResNet50 model.The model was trained upon the **NYUv2 dataset** ([0jl/NYUv2](https://huggingface.co/datasets/0jl/NYUv2)). The trained model is saved using Python's `pickle` library for easy deployment and reuse.

### Loading the Model
The model is saved as `model.pkl` using `pickle`. You can load and use it as follows:

```python
with open("model.pkl", "rb") as f:
    model = pickle.load(f)

features = extract_features("path/to/image.jpg") 
predicted_depth = model.predict([features])
print(predicted_depth[0])
```
**NOTE:** extract_features() is a predefined function in the original code which uses ResNet50 to extract features out of the image.

## Key Features
- **Model Architecture**:
  - Feature extraction: ResNet50 (pre-trained on ImageNet, with the top layers removed and global average pooling).
  - Regression: XGBoost, optimized for structured data prediction.
- **Training GPU**: NVIDIA RTX 4060 Ti, ensuring efficient computation.
- **Target**: Predict the average depth of images based on the depth maps from the dataset.

## Dataset
- Dataset: **NYUv2** ([0jl/NYUv2](https://huggingface.co/datasets/0jl/NYUv2))
- Format: The dataset includes RGB images and corresponding depth maps.
- Preprocessing:
  - Images were resized to 224x224 pixels to match the input requirements of ResNet50.
  - Depth maps were converted into single average depth values.

## Model Training
1. **Feature Extraction**:
   - ResNet50 was used to extract a fixed-length feature vector from each image.
   - Preprocessing: Images were normalized using the `preprocess_input` function from TensorFlow's ResNet50 module.
2. **Regression**:
   - XGBoost regressor was trained on the extracted features to predict average depth values.
   - Hyperparameters were tuned using cross-validation techniques for optimal performance.

## Results
- **R² Score**: 0.841
- Performance is reasonable for a first few implementation and can be further improved with additional tuning or by improving feature extraction methods.

## How to Use
### Requirements
1. Python 3.10+
2. Required libraries:
   - `numpy`
   - `pickle`
   - `xgboost`
   - `datasets`
   - `tensorflow`
   - `scikitlearn`

Install the dependencies using pip:
```bash
pip install numpy tensorflow xgboost datasets scikit-learn
```

### Training Pipeline
If you want to retrain the model, follow these steps:
 
1. Download the **NYUv2 dataset** from Hugging Face:
   ```python
   from datasets import load_dataset
   dataset = load_dataset("0jl/NYUv2")
   ```
2. Extract features using ResNet50:
   ```python

   model = ResNet50(weights="imagenet", include_top=False, pooling="avg")

   from PIL import Image
   def extract_features(image_path):
       image_array = preprocess_input(image_array)
       features = model.predict(image_array)
       return features.flatten()
   ```
3. Train the XGBoost regressor on the extracted features and save the model:
   ```python

   regressor = XGBRegressor()
   regressor.fit(X_train, y_train)

   with open("model.pkl", "wb") as f:
       pickle.dump(regressor, f)
   ```
**NOTE:** This pipeline has just the base fundamental code more additional parameter tunings and preprocessing steps were being conducted during the training of the original model.


## License
This project is licensed under the Apache License 2.0.

## Acknowledgments
- Hugging Face for hosting the NYUv2 dataset.
- NVIDIA RTX 4060 Ti for providing efficient GPU acceleration.
- TensorFlow and XGBoost for robust machine learning frameworks.