``blending_frame`` ------------------ - Available in: Stacked Ensembles, AutoML - Hyperparameter: no Description ~~~~~~~~~~~ H2O's Stacked Ensemble method is supervised ensemble machine learning algorithm that finds the optimal combination of a collection of prediction algorithms using a process called stacking (or Super Learning). The algorithm that learns the optimal combination of the base learners is called the metalearning algorithm or metalearner. The optional ``blending_frame`` parameter is used to specify a frame to be used for computing the predictions that serve as the training frame for the metalearner. If provided, this triggers blending mode. Blending mode is faster than cross-validating the base learners (though these ensembles may not perform as well as the Super Learner ensemble). In addition, a blending frame adds the ability to train stacked ensembles on time-series data, where holdout data is "future" data compared to "past" data in training set. Related Parameters ~~~~~~~~~~~~~~~~~~ - None Example ~~~~~~~ .. tabs:: .. code-tab:: r R library(h2o) h2o.init() # import the higgs_train_5k train and test datasets higgs <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/higgs_train_5k.csv") # split the dataset into training and blending frames higgs_splits <- h2o.splitFrame(data = higgs, ratios = 0.8, seed = 1234) train <- higgs_splits[[1]] blend <- higgs_splits[[2]] # Identify predictors and response y <- "response" x <- setdiff(names(train), y) # Convert the response column in train and test datasets to a factor train[, y] <- as.factor(train[, y]) blend[, y] <- as.factor(blend[, y]) # Set number of folds for base learners nfolds <- 3 # Train & Cross-validate a GBM model my_gbm <- h2o.gbm(x = x, y = y, training_frame = train, distribution = "bernoulli", ntrees = 10, nfolds = nfolds, keep_cross_validation_predictions = TRUE, seed = 1) # Train & Cross-validate an RF model my_rf <- h2o.randomForest(x = x, y = y, training_frame = train, ntrees = 10, nfolds = nfolds, keep_cross_validation_predictions = TRUE, seed = 1) # Train a stacked ensemble using a blending frame stack <- h2o.stackedEnsemble(x = x, y = y, base_models = list(my_gbm, my_rf), training_frame = train, blending_frame = blend, seed = 1) h2o.auc(h2o.performance(stack, blend)) # [1] 0.7576039 .. code-tab:: python import h2o from h2o.estimators.random_forest import H2ORandomForestEstimator from h2o.estimators.gbm import H2OGradientBoostingEstimator from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator h2o.init() # import the higgs_train_5k train and test datasets higgs = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/higgs_train_5k.csv") # split the dataset into training and blending train, blend = higgs.split_frame(ratios = [.8], seed = 1234) # Identify predictors and response x = train.columns y = "response" x.remove(y) # Convert the response column in train and test datasets to a factor train[y] = train[y].asfactor() blend[y] = blend[y].asfactor() # Set number of folds for base learners nfolds = 3 # Train and cross-validate a GBM model my_gbm = H2OGradientBoostingEstimator(distribution="bernoulli", ntrees=10, nfolds=nfolds, fold_assignment="Modulo", keep_cross_validation_predictions=True, seed=1) my_gbm.train(x=x, y=y, training_frame=train) # Train and cross-validate an RF model my_rf = H2ORandomForestEstimator(ntrees=50, nfolds=nfolds, fold_assignment="Modulo", keep_cross_validation_predictions=True, seed=1) my_rf.train(x=x, y=y, training_frame=train) # Train a stacked ensemble using a blending frame stack_blend = H2OStackedEnsembleEstimator(base_models=[my_gbm, my_rf], seed=1) stack_blend.train(x=x, y=y, training_frame=train, blending_frame=blend) stack_blend.model_performance(blend).auc() # 0.7736312597328088