In [1]:
import h2o
from h2o.automl import H2OAutoML
In [2]:
h2o.init()
Checking whether there is an H2O instance running at http://localhost:54321 . connected.
Warning: Version mismatch. H2O is version 3.37.0.99999, but the h2o-python package is version 0.0.local. This is a developer build, please contact your developer.
H2O_cluster_uptime: 21 mins 33 secs
H2O_cluster_timezone: Europe/Prague
H2O_data_parsing_timezone: UTC
H2O_cluster_version: 3.37.0.99999
H2O_cluster_version_age: 25 minutes
H2O_cluster_name: tomasfryda
H2O_cluster_total_nodes: 1
H2O_cluster_free_memory: 1.309 Gb
H2O_cluster_total_cores: 16
H2O_cluster_allowed_cores: 8
H2O_cluster_status: locked, healthy
H2O_connection_url: http://localhost:54321
H2O_connection_proxy: {"http": null, "https": null}
H2O_internal_security: False
Python_version: 3.9.4 final
In [3]:
df = h2o.import_file("https://h2o-public-test-data.s3.amazonaws.com/smalldata/wine/winequality-redwhite-no-BOM.csv")

response = "quality"

predictors = [
  "fixed acidity", "volatile acidity", "citric acid", "residual sugar", "chlorides", "free sulfur dioxide",
  "total sulfur dioxide", "density", "pH", "sulphates", "alcohol",  "type"
]


train, test = df.split_frame(seed=1)
Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%
In [4]:
aml = H2OAutoML(max_runtime_secs=120, seed=1)
aml.train(x=predictors, y=response, training_frame=train)
AutoML progress: |███████████████████████████████████████████████████████████████| (done) 100%
Model Details
=============
H2OStackedEnsembleEstimator :  Stacked Ensemble
Model Key:  StackedEnsemble_AllModels_3_AutoML_3_20220408_140442

No model summary for this model

ModelMetricsRegressionGLM: stackedensemble
** Reported on train data. **

MSE: 0.023283411162563286
RMSE: 0.15258902700575586
MAE: 0.11065966448963627
RMSLE: 0.023320327709643703
R^2: 0.9696935442322193
Mean Residual Deviance: 0.023283411162563286
Null degrees of freedom: 4864
Residual degrees of freedom: 4856
Null deviance: 3737.612744090442
Residual deviance: 113.2737953058704
AIC: -4466.196859377104

ModelMetricsRegressionGLM: stackedensemble
** Reported on cross-validation data. **

MSE: 0.3759629223874193
RMSE: 0.6131581544654032
MAE: 0.42868816136684146
RMSLE: 0.0929837223844714
R^2: 0.5106342623893481
Mean Residual Deviance: 0.3759629223874193
Null degrees of freedom: 4864
Residual degrees of freedom: 4855
Null deviance: 3739.837050670749
Residual deviance: 1829.059617414795
AIC: 9069.013913851764

Cross-Validation Metrics Summary: 
mean sd cv_1_valid cv_2_valid cv_3_valid cv_4_valid cv_5_valid
0 mae 0.429053 0.013194 0.418140 0.414085 0.432899 0.433157 0.446985
1 mean_residual_deviance 0.376736 0.026199 0.351823 0.353463 0.392439 0.372864 0.413090
2 mse 0.376736 0.026199 0.351823 0.353463 0.392439 0.372864 0.413090
3 null_deviance 747.967400 44.680550 758.488650 712.955260 794.441160 783.059700 690.892200
4 r2 0.508815 0.032326 0.530349 0.505313 0.523121 0.531222 0.454068
5 residual_deviance 365.811920 12.459985 356.044860 350.635000 378.703370 366.525540 377.150850
6 rmse 0.613494 0.021237 0.593147 0.594527 0.626449 0.610626 0.642721
7 rmsle 0.093029 0.004232 0.088082 0.090218 0.096522 0.092163 0.098159
Out[4]:

In [5]:
aml.explain(test);

Leaderboard

Leaderboard shows models with their metrics. When provided with H2OAutoML object, the leaderboard shows 5-fold cross-validated metrics by default (depending on the H2OAutoML settings), otherwise it shows metrics computed on the frame. At most 20 models are shown by default.
model_id rmse mse mae rmsle mean_residual_deviance training_time_ms predict_time_per_row_msalgo
StackedEnsemble_AllModels_3_AutoML_3_20220408_140442 0.6131580.3759630.4286880.0929837 0.375963 1334 0.072912StackedEnsemble
StackedEnsemble_BestOfFamily_4_AutoML_3_20220408_1404420.6152530.3785360.4319530.093312 0.378536 423 0.027955StackedEnsemble
StackedEnsemble_AllModels_2_AutoML_3_20220408_140442 0.61886 0.3829880.4397220.0937725 0.382988 146 0.033774StackedEnsemble
StackedEnsemble_AllModels_1_AutoML_3_20220408_140442 0.6191650.3833650.4406460.0937997 0.383365 146 0.02613 StackedEnsemble
StackedEnsemble_BestOfFamily_3_AutoML_3_20220408_1404420.6197060.3840360.4412410.0938909 0.384036 148 0.025569StackedEnsemble
StackedEnsemble_BestOfFamily_2_AutoML_3_20220408_1404420.6204270.38493 0.4428760.0939805 0.38493 138 0.023944StackedEnsemble
DRF_1_AutoML_3_20220408_140442 0.6229710.3880920.4496110.094527 0.388092 607 0.012088DRF
GBM_grid_1_AutoML_3_20220408_140442_model_8 0.6245750.3900940.4224750.0949575 0.390094 1812 0.013222GBM
XRT_1_AutoML_3_20220408_140442 0.6260120.3918910.4523580.0950673 0.391891 836 0.012583DRF
GBM_grid_1_AutoML_3_20220408_140442_model_5 0.6417730.4118720.4699730.0970923 0.411872 941 0.015306GBM
GBM_4_AutoML_3_20220408_140442 0.64486 0.4158450.4844370.0975089 0.415845 329 0.009188GBM
XGBoost_grid_1_AutoML_3_20220408_140442_model_19 0.6468110.4183650.4799620.0976242 0.418365 785 0.005213XGBoost
XGBoost_grid_1_AutoML_3_20220408_140442_model_2 0.6482990.4202920.44705 0.0980709 0.420292 580 0.005253XGBoost
GBM_grid_1_AutoML_3_20220408_140442_model_1 0.6489650.4211550.4787090.0982079 0.421155 553 0.014966GBM
GBM_grid_1_AutoML_3_20220408_140442_model_10 0.65061 0.4232930.4789760.0985366 0.423293 1450 0.016868GBM
StackedEnsemble_BestOfFamily_1_AutoML_3_20220408_1404420.65196 0.4250520.4840970.0984392 0.425052 130 0.017711StackedEnsemble
XGBoost_grid_1_AutoML_3_20220408_140442_model_28 0.6521140.4252520.48586 0.0983887 0.425252 1965 0.005989XGBoost
GBM_3_AutoML_3_20220408_140442 0.6543460.4281680.4973770.0989008 0.428168 267 0.010421GBM
XGBoost_grid_1_AutoML_3_20220408_140442_model_24 0.6550670.4291130.4552670.0990104 0.429113 1124 0.004858XGBoost
XGBoost_grid_1_AutoML_3_20220408_140442_model_33 0.6568740.4314840.46009 0.099322 0.431484 560 0.002308XGBoost

Residual Analysis

Residual Analysis plots the fitted values vs residuals on a test dataset. Ideally, residuals should be randomly distributed. Patterns in this plot can indicate potential problems with the model selection, e.g., using simpler model than necessary, not accounting for heteroscedasticity, autocorrelation, etc. Note that if you see "striped" lines of residuals, that is an artifact of having an integer valued (vs a real valued) response variable.

Variable Importance

The variable importance plot shows the relative importance of the most important variables in the model.

Variable Importance Heatmap

Variable importance heatmap shows variable importance across multiple models. Some models in H2O return variable importance for one-hot (binary indicator) encoded versions of categorical columns (e.g. Deep Learning, XGBoost). In order for the variable importance of categorical columns to be compared across all model types we compute a summarization of the the variable importance across all one-hot encoded features and return a single variable importance for the original categorical feature. By default, the models and variables are ordered by their similarity.

Model Correlation

This plot shows the correlation between the predictions of the models. For classification, frequency of identical predictions is used. By default, models are ordered by their similarity (as computed by hierarchical clustering). Interpretable models, such as GAM, GLM, and RuleFit are highlighted using red colored text.

SHAP Summary

SHAP summary plot shows the contribution of the features for each instance (row of data). The sum of the feature contributions and the bias term is equal to the raw prediction of the model, i.e., prediction before applying inverse link function.

Partial Dependence Plots

Partial dependence plot (PDP) gives a graphical depiction of the marginal effect of a variable on the response. The effect of a variable is measured in change in the mean response. PDP assumes independence between the feature for which is the PDP computed and the rest.





Individual Conditional Expectation

An Individual Conditional Expectation (ICE) plot gives a graphical depiction of the marginal effect of a variable on the response. ICE plots are similar to partial dependence plots (PDP); PDP shows the average effect of a feature while ICE plot shows the effect for a single instance. This function will plot the effect for each decile. In contrast to the PDP, ICE plots can provide more insight, especially when there is stronger feature interaction.





In [6]:
aml.explain_row(test, row_index=42);

Leaderboard

Leaderboard shows models with their metrics and their predictions for a given row. When provided with H2OAutoML object, the leaderboard shows 5-fold cross-validated metrics by default (depending on the H2OAutoML settings), otherwise it shows metrics computed on the frame. At most 20 models are shown by default.
model_id rmse mse mae rmsle mean_residual_deviance training_time_ms predict_time_per_row_msalgo predict
StackedEnsemble_AllModels_3_AutoML_3_20220408_140442 0.6131580.3759630.4286880.0929837 0.375963 1334 0.072912StackedEnsemble 5.27411
StackedEnsemble_BestOfFamily_4_AutoML_3_20220408_1404420.6152530.3785360.4319530.093312 0.378536 423 0.027955StackedEnsemble 5.29165
StackedEnsemble_AllModels_2_AutoML_3_20220408_140442 0.61886 0.3829880.4397220.0937725 0.382988 146 0.033774StackedEnsemble 5.36343
StackedEnsemble_AllModels_1_AutoML_3_20220408_140442 0.6191650.3833650.4406460.0937997 0.383365 146 0.02613 StackedEnsemble 5.3616
StackedEnsemble_BestOfFamily_3_AutoML_3_20220408_1404420.6197060.3840360.4412410.0938909 0.384036 148 0.025569StackedEnsemble 5.36032
StackedEnsemble_BestOfFamily_2_AutoML_3_20220408_1404420.6204270.38493 0.4428760.0939805 0.38493 138 0.023944StackedEnsemble 5.35617
DRF_1_AutoML_3_20220408_140442 0.6229710.3880920.4496110.094527 0.388092 607 0.012088DRF 5.37536
GBM_grid_1_AutoML_3_20220408_140442_model_8 0.6245750.3900940.4224750.0949575 0.390094 1812 0.013222GBM 5.26905
XRT_1_AutoML_3_20220408_140442 0.6260120.3918910.4523580.0950673 0.391891 836 0.012583DRF 5.50439
GBM_grid_1_AutoML_3_20220408_140442_model_5 0.6417730.4118720.4699730.0970923 0.411872 941 0.015306GBM 5.45243

SHAP Explanation

SHAP explanation shows contribution of features for a given instance. The sum of the feature contributions and the bias term is equal to the raw prediction of the model, i.e., prediction before applying inverse link function. H2O implements TreeSHAP which when the features are correlated, can increase contribution of a feature that had no influence on the prediction.