--- |
library_name: sklearn |
tags: |
- sklearn |
- skops |
- tabular-regression |
model_format: skops |
model_file: model.skops |
widget: |
structuredData: |
AveBedrms: |
- 0.9290780141843972 |
- 0.9458483754512635 |
- 1.087360594795539 |
AveOccup: |
- 3.1134751773049647 |
- 3.0613718411552346 |
- 3.2657992565055762 |
AveRooms: |
- 6.304964539007092 |
- 6.945848375451264 |
- 3.8884758364312266 |
HouseAge: |
- 17.0 |
- 15.0 |
- 24.0 |
Latitude: |
- 34.23 |
- 36.84 |
- 34.04 |
Longitude: |
- -117.41 |
- -119.77 |
- -118.3 |
MedInc: |
- 6.1426 |
- 5.3886 |
- 1.7109 |
Population: |
- 439.0 |
- 848.0 |
- 1757.0 |
--- |
# Model description |
Gradient boosting regressor trained on California Housing dataset |
The model is a gradient boosting regressor from sklearn. On top of the standard |
features, it contains predictions from a KNN models. These predictions are calculated |
out of fold, then added on top of the existing features. These features are really |
helpful for decision tree-based models, since those cannot easily learn from geospatial |
data. |
## Intended uses & limitations |
This model is meant for demonstration purposes |
## Training Procedure |
### Hyperparameters |
The model is trained with below hyperparameters. |
<details> |
<summary> Click to expand </summary> |
| Hyperparameter | Value | |
|-----------------------------------------------|--------------------------------------------------------------| |
| cv | | |
| estimators | [('knn@5', Pipeline(steps=[('select_cols',<br /> ColumnTransformer(transformers=[('long_and_lat', 'passthrough',<br /> ['Longitude', 'Latitude'])])),<br /> ('knn', KNeighborsRegressor())]))] | |
| final_estimator__alpha | 0.9 | |
| final_estimator__ccp_alpha | 0.0 | |
| final_estimator__criterion | friedman_mse | |
| final_estimator__init | | |
| final_estimator__learning_rate | 0.1 | |
| final_estimator__loss | squared_error | |
| final_estimator__max_depth | 3 | |
| final_estimator__max_features | | |
| final_estimator__max_leaf_nodes | | |
| final_estimator__min_impurity_decrease | 0.0 | |
| final_estimator__min_samples_leaf | 1 | |
| final_estimator__min_samples_split | 2 | |
| final_estimator__min_weight_fraction_leaf | 0.0 | |
| final_estimator__n_estimators | 500 | |
| final_estimator__n_iter_no_change | | |
| final_estimator__random_state | 0 | |
| final_estimator__subsample | 1.0 | |
| final_estimator__tol | 0.0001 | |
| final_estimator__validation_fraction | 0.1 | |
| final_estimator__verbose | 0 | |
| final_estimator__warm_start | False | |
| final_estimator | GradientBoostingRegressor(n_estimators=500, random_state=0) | |
| n_jobs | | |
| passthrough | True | |
| verbose | 0 | |
| knn@5 | Pipeline(steps=[('select_cols',<br /> ColumnTransformer(transformers=[('long_and_lat', 'passthrough',<br /> ['Longitude', 'Latitude'])])),<br /> ('knn', KNeighborsRegressor())]) | |
| knn@5__memory | | |
| knn@5__steps | [('select_cols', ColumnTransformer(transformers=[('long_and_lat', 'passthrough',<br /> ['Longitude', 'Latitude'])])), ('knn', KNeighborsRegressor())] | |
| knn@5__verbose | False | |
| knn@5__select_cols | ColumnTransformer(transformers=[('long_and_lat', 'passthrough',<br /> ['Longitude', 'Latitude'])]) | |
| knn@5__knn | KNeighborsRegressor() | |
| knn@5__select_cols__n_jobs | | |
| knn@5__select_cols__remainder | drop | |
| knn@5__select_cols__sparse_threshold | 0.3 | |
| knn@5__select_cols__transformer_weights | | |
| knn@5__select_cols__transformers | [('long_and_lat', 'passthrough', ['Longitude', 'Latitude'])] | |
| knn@5__select_cols__verbose | False | |
| knn@5__select_cols__verbose_feature_names_out | True | |
| knn@5__select_cols__long_and_lat | passthrough | |
| knn@5__knn__algorithm | auto | |
| knn@5__knn__leaf_size | 30 | |
| knn@5__knn__metric | minkowski | |
| knn@5__knn__metric_params | | |
| knn@5__knn__n_jobs | | |
| knn@5__knn__n_neighbors | 5 | |
| knn@5__knn__p | 2 | |
| knn@5__knn__weights | uniform | |
</details> |
### Model Plot |
The model plot is below. |
## Evaluation Results |
Metrics are calculated on the test set |
| Metric | Value | |
|-------------------------|--------------| |
| Root mean squared error | 44273.5 | |
| Mean absolute error | 30079.9 | |
| R² | 0.805954 | |
## Dataset description |
California Housing dataset |
-------------------------- |
**Data Set Characteristics:** |
:Number of Instances: 20640 |
:Number of Attributes: 8 numeric, predictive attributes and the target |
:Attribute Information: |
- MedInc median income in block group |
- HouseAge median house age in block group |
- AveRooms average number of rooms per household |
- AveBedrms average number of bedrooms per household |
- Population block group population |
- AveOccup average number of household members |
- Latitude block group latitude |
- Longitude block group longitude |
:Missing Attribute Values: None |
This dataset was obtained from the StatLib repository. |
https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html |
The target variable is the median house value for California districts, |
expressed in hundreds of thousands of dollars ($100,000). |
This dataset was derived from the 1990 U.S. census, using one row per census |
block group. A block group is the smallest geographical unit for which the U.S. |
Census Bureau publishes sample data (a block group typically has a population |
of 600 to 3,000 people). |
An household is a group of people residing within a home. Since the average |
number of rooms and bedrooms in this dataset are provided per household, these |
columns may take surpinsingly large values for block groups with few households |
and many empty houses, such as vacation resorts. |
It can be downloaded/loaded using the |
:func:`sklearn.datasets.fetch_california_housing` function. |
.. topic:: References |
- Pace, R. Kelley and Ronald Barry, Sparse Spatial Autoregressions, |
Statistics and Probability Letters, 33 (1997) 291-297 |
### Data distribution |
<details> |
<summary> Click to expand </summary> |
</details> |
# How to Get Started with the Model |
Run the code below to load the model |
```python |
import json |
import pandas as pd |
import skops.io as sio |
model = sio.load("model.skops") |
with open("config.json") as f: |
config = json.load(f) |
model.predict(pd.DataFrame.from_dict(config["sklearn"]["example_input"])) |
``` |
# Model Card Authors |
Benjamin Bossan |
# Model Card Contact |
[email protected] |
