|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- 0jl/NYUv2 |
|
language: |
|
- en |
|
metrics: |
|
- r2 |
|
- mae |
|
- mse |
|
pipeline_tag: depth-estimation |
|
tags: |
|
- xgboost |
|
- python |
|
- depth-estimation |
|
- resnet50 |
|
--- |
|
|
|
# Depth Estimation Using ResNet50 and XGBoost |
|
## Author |
|
- **Vishal Adithya.A** |
|
## Overview |
|
This project demonstrates a depth estimation XgBoost Regressor model that predicts the average depth of images provided using features extracted from a pre-trained ResNet50 model.The model was trained upon the **NYUv2 dataset** ([0jl/NYUv2](https://huggingface.co/datasets/0jl/NYUv2)). The trained model is saved using Python's `pickle` library for easy deployment and reuse. |
|
|
|
### Loading the Model |
|
The model is saved as `model.pkl` using `pickle`. You can load and use it as follows: |
|
|
|
```python |
|
with open("model.pkl", "rb") as f: |
|
model = pickle.load(f) |
|
|
|
features = extract_features("path/to/image.jpg") |
|
predicted_depth = model.predict([features]) |
|
print(predicted_depth[0]) |
|
``` |
|
**NOTE:** extract_features() is a predefined function in the original code which uses ResNet50 to extract features out of the image. |
|
|
|
## Key Features |
|
- **Model Architecture**: |
|
- Feature extraction: ResNet50 (pre-trained on ImageNet, with the top layers removed and global average pooling). |
|
- Regression: XGBoost, optimized for structured data prediction. |
|
- **Training GPU**: NVIDIA RTX 4060 Ti, ensuring efficient computation. |
|
- **Target**: Predict the average depth of images based on the depth maps from the dataset. |
|
|
|
## Dataset |
|
- Dataset: **NYUv2** ([0jl/NYUv2](https://huggingface.co/datasets/0jl/NYUv2)) |
|
- Format: The dataset includes RGB images and corresponding depth maps. |
|
- Preprocessing: |
|
- Images were resized to 224x224 pixels to match the input requirements of ResNet50. |
|
- Depth maps were converted into single average depth values. |
|
|
|
## Model Training |
|
1. **Feature Extraction**: |
|
- ResNet50 was used to extract a fixed-length feature vector from each image. |
|
- Preprocessing: Images were normalized using the `preprocess_input` function from TensorFlow's ResNet50 module. |
|
2. **Regression**: |
|
- XGBoost regressor was trained on the extracted features to predict average depth values. |
|
- Hyperparameters were tuned using cross-validation techniques for optimal performance. |
|
|
|
## Results |
|
- **R² Score**: 0.841 |
|
- Performance is reasonable for a first few implementation and can be further improved with additional tuning or by improving feature extraction methods. |
|
|
|
## How to Use |
|
### Requirements |
|
1. Python 3.10+ |
|
2. Required libraries: |
|
- `numpy` |
|
- `pickle` |
|
- `xgboost` |
|
- `datasets` |
|
- `tensorflow` |
|
- `scikitlearn` |
|
|
|
Install the dependencies using pip: |
|
```bash |
|
pip install numpy tensorflow xgboost datasets scikit-learn |
|
``` |
|
|
|
### Training Pipeline |
|
If you want to retrain the model, follow these steps: |
|
|
|
1. Download the **NYUv2 dataset** from Hugging Face: |
|
```python |
|
from datasets import load_dataset |
|
dataset = load_dataset("0jl/NYUv2") |
|
``` |
|
2. Extract features using ResNet50: |
|
```python |
|
|
|
model = ResNet50(weights="imagenet", include_top=False, pooling="avg") |
|
|
|
from PIL import Image |
|
def extract_features(image_path): |
|
image_array = preprocess_input(image_array) |
|
features = model.predict(image_array) |
|
return features.flatten() |
|
``` |
|
3. Train the XGBoost regressor on the extracted features and save the model: |
|
```python |
|
|
|
regressor = XGBRegressor() |
|
regressor.fit(X_train, y_train) |
|
|
|
with open("model.pkl", "wb") as f: |
|
pickle.dump(regressor, f) |
|
``` |
|
**NOTE:** This pipeline has just the base fundamental code more additional parameter tunings and preprocessing steps were being conducted during the training of the original model. |
|
|
|
|
|
## License |
|
This project is licensed under the Apache License 2.0. |
|
|
|
## Acknowledgments |
|
- Hugging Face for hosting the NYUv2 dataset. |
|
- NVIDIA RTX 4060 Ti for providing efficient GPU acceleration. |
|
- TensorFlow and XGBoost for robust machine learning frameworks. |