``metalearner_algorithm`` ------------------------- - Available in: Stacked Ensembles - Hyperparameter: no Description ~~~~~~~~~~~ H2O's Stacked Ensemble method is supervised ensemble machine learning algorithm that finds the optimal combination of a collection of prediction algorithms using a process called stacking (or Super Learning). The algorithm that learns the optimal combination of the base learners is called the metalearning algorithm or metalearner. By default, the Stacked Ensemble metalearner is a default H2O GLM with non-negative weights. The ``metalearner_algorithm`` option allows you to specify a different metalearner algorithm. Options include: - ``"AUTO"`` (GLM with non negative weights & standardization turned off, and if ``validation_frame`` is present, then ``lambda_search`` is set to True; may change over time). This is the default. - ``"glm"`` (GLM with default parameters) - ``"gbm"`` (GBM with default parameters) - ``"drf"`` (Random Forest with default parameters) - ``"deeplearning"`` (Deep Learning with default parameters) - ``"naivebayes"`` (NaiveBayes with default parameters) - ``"xgboost"`` (if available, XGBoost with default parameters) The metalearner will use default hyperparameter values for the specified algorithm, unless you also specify `metalearner_params `__ to customize the hyperparameters values. The default parameters for the metalearning algorithms may not be the best choice, so it's a good idea to experiment a bit with different parameters using ``metalearner_params``. In the `next release `__ of H2O, there will be an option to easily do a grid search over metalearner parameters using the standard H2O Grid interface, which will make tuning the metalearner a bit easier. Related Parameters ~~~~~~~~~~~~~~~~~~ - `metalearner_params `__ Example ~~~~~~~ .. tabs:: .. code-tab:: r R library(h2o) h2o.init() # import the higgs_train_5k train and test datasets train <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/higgs_train_5k.csv") test <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/higgs_test_5k.csv") # Identify predictors and response y <- "response" x <- setdiff(names(train), y) # Convert the response column in train and test datasets to a factor train[, y] <- as.factor(train[, y]) test[, y] <- as.factor(test[, y]) # Set number of folds for base learners nfolds <- 3 # Train & Cross-validate a GBM model my_gbm <- h2o.gbm(x = x, y = y, training_frame = train, distribution = "bernoulli", ntrees = 10, nfolds = nfolds, keep_cross_validation_predictions = TRUE, seed = 1) # Train & Cross-validate an RF model my_rf <- h2o.randomForest(x = x, y = y, training_frame = train, ntrees = 10, nfolds = nfolds, keep_cross_validation_predictions = TRUE, seed = 1) # Next we can train a few different ensembles using different metalearners # Train a stacked ensemble using the default metalearner algorithm stack <- h2o.stackedEnsemble(x = x, y = y, training_frame = train, base_models = list(my_gbm, my_rf)) h2o.auc(h2o.performance(stack, test)) # 0.7570171 # Train a stacked ensemble using GBM as the metalearner algorithm # The metalearner will use GBM default values stack_gbm <- h2o.stackedEnsemble(x = x, y = y, training_frame = train, base_models = list(my_gbm, my_rf), metalearner_algorithm = "gbm") h2o.auc(h2o.performance(stack_gbm, test)) # 0.7511055 # Train a stacked ensemble using RF as the metalearner algorithm # The metalearner will use RF default values stack_rf <- h2o.stackedEnsemble(x = x, y = y, training_frame = train, base_models = list(my_gbm, my_rf), metalearner_algorithm = "drf") h2o.auc(h2o.performance(stack_rf, test)) # 0.7232461 # Train a stacked ensemble using Deep Learning as the metalearner algorithm # The metalearner will use RF default values stack_dl <- h2o.stackedEnsemble(x = x, y = y, training_frame = train, base_models = list(my_gbm, my_rf), metalearner_algorithm = "deeplearning") h2o.auc(h2o.performance(stack_dl, test)) # 0.7571556 .. code-tab:: python import h2o from h2o.estimators.random_forest import H2ORandomForestEstimator from h2o.estimators.gbm import H2OGradientBoostingEstimator from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator h2o.init() # import the higgs_train_5k train and test datasets train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/higgs_train_5k.csv") test = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/higgs_test_5k.csv") # Identify predictors and response x = train.columns y = "response" x.remove(y) # Convert the response column in train and test datasets to a factor train[y] = train[y].asfactor() test[y] = test[y].asfactor() # Set number of folds for base learners nfolds = 3 # Train and cross-validate a GBM model my_gbm = H2OGradientBoostingEstimator(distribution="bernoulli", ntrees=10, nfolds=nfolds, fold_assignment="Modulo", keep_cross_validation_predictions=True, seed=1) my_gbm.train(x=x, y=y, training_frame=train) # Train and cross-validate an RF model my_rf = H2ORandomForestEstimator(ntrees=50, nfolds=nfolds, fold_assignment="Modulo", keep_cross_validation_predictions=True, seed=1) my_rf.train(x=x, y=y, training_frame=train) # Next we can train a few different ensembles using different metalearners # Train a stacked ensemble using the default metalearner algorithm stack = H2OStackedEnsembleEstimator(base_models=[my_gbm, my_rf]) stack.train(x=x, y=y, training_frame=train) stack.model_performance(test).auc() # 0.7522591310013634 # Train a stacked ensemble with a GBM metalearner algorithm # The metalearner will use GBM default values stack_gbm = H2OStackedEnsembleEstimator(base_models=[my_gbm, my_rf], metalearner_algorithm="gbm") stack_gbm.train(x=x, y=y, training_frame=train) stack_gbm.model_performance(test).auc() # 0.7522591310013634 # Train a stacked ensemble with a RF metalearner algorithm # The metalearner will use RF default values stack_rf = H2OStackedEnsembleEstimator(base_models=[my_gbm, my_rf], metalearner_algorithm="drf") stack_rf.train(x=x, y=y, training_frame=train) stack_rf.model_performance(test).auc() # 0.7016302070136065 # Train a stacked ensemble with a Deep Learning metalearner algorithm # The metalearner will use Deep Learning default values stack_dl = H2OStackedEnsembleEstimator(base_models=[my_gbm, my_rf], metalearner_algorithm="deeplearning") stack_dl.train(x=x, y=y, training_frame=train) stack_dl.model_performance(test).auc() # 0.7634122856763638