auc_type
¶
Available in: Random Forest, GBM, Deep Learning, XGBoost, GLM, GAM, Naive Bayes, Stack Ensembles
Hyperparameter: no
Description¶
To calculate AUC or AUCPR for multinomial classification, this parameter has to be set. By default, this option is disabled due to expensive CPU and memory usage. This functionality is available only for multinomial classification problems with a maximum of 50 domains.
This parameter is important when the early_stopping parameter is set to AUC
and also for sorting model by metric in Grid search using sort_by
parameter.
More information about Multinomial AUC/AUCPR metric is on Performance and prediction page.
If the AUC type is
WEIGHTED_OVR
, the weighted average One vs. Rest AUC or AUCPR will be the default value for AUC and AUCPR metrics.If the default AUC type is
WEIGHTED_OVO
, the weighted average One vs. One AUC or AUCPR will be the default value for AUC and AUCPR metrics.If the default AUC type is
MACRO_OVR
, the macro average One vs. Rest AUC or AUCPR will be the default value for AUC and AUCPR metrics.If the default AUC type is
MACRO_OVO
, the macro average One vs. One AUC or AUCPR will be the default value for AUC and AUCPR metrics.If the default AUC type is
NONE
, the metric is not calculated and the None value is returned instead.If the default AUC type is
AUTO
, the auto option isNONE
by default.
NOTE: auc_type
is available ONLY for multinomial distribution and is NONE
by default.
Example¶
library(h2o)
h2o.init()
# Import the cars dataset:
# (this dataset is used to classify whether or not a car is economical based on
# the car's displacement, power, weight, and acceleration, and the year it was made)
cars <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv")
# Set the predictor and response column names:
predictors <- c("displacement", "power", "weight", "acceleration", "year")
response <- "cylinders"
cars[response] <- h2o.asfactor(cars[response])
# Split into train and validation sets:
cars_splits <- h2o.splitFrame(data = cars, ratios = 0.8), seed = 1234)
train <- cars_splits[[1]]
valid <- cars_splits[[2]]
# Try using the distribution parameter & train a GBM:
car_gbm <- h2o.gbm(x = predictors,
y = response,
training_frame = train,
validation_frame = valid,
distribution = "multinomial",
auc_type = "MACRO_OVR",
seed = 1234)
# Print the AUC for your validation data:
print(h2o.auc(car_gbm, valid = TRUE))
# Print the AUCPR for your validation data:
print(h2o.aucpr(car_gbm, valid = TRUE))
# Print the whole AUC table:
print(cars_gbm.multinomial_auc_table())
# Print the whole AUCPR table:
print(cars_gbm.multinomial_aucpr_table())
import h2o
from h2o.estimators import H2OGradientBoostingEstimator
h2o.init()
# Import the cars dataset:
# (this dataset is used to classify whether or not a car is economical based on
# the car's displacement, power, weight, and acceleration, and the year it was made)
cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv")
# Set the predictor and responsecolumn names:
predictors - ["displacement","power","weight","acceleration","year"]
response = "cylinders"
cars[response] = cars[response].asfactor()
# Split into train and validation sets:
train, valid = cars.split_frame(ratios = [.8], seed = 1234)
# Try using the distribution parameter & train a GBM:
cars_gbm = H2OGradientBoostingEstimator(distribution="multinomial",
seed=1234,
auc_type="MACRO_OVR")
cars_gbm.train(x=predictors,
y=response,
training_frame=train,
validation_frame=valid)
# Print the AUC for the validation data:
print(cars_gbm.auc(valid=True))
# Print the AUCPR for the validation data:
print(cars_gbm.pr_auc(valid=True))
# Print the whole AUC table:
print(cars_gbm.multinomial_auc_table())
# Print the whole AUCPR table:
print(cars_gbm.multinomial_aucpr_table())