Ensemble Learning in Driverless AI¶
This section describes Driverless AI’s ensemble learning capabilities.
An ensemble is a hierarchical composition of multiple models, where every level in the hierarchy uses the output of the previous level as input. The simplest ensemble is a 2-layer architecture with a single linear model (the meta model or meta learner) combining the predictions from several first layer models (base models). This is the default ensemble model in Driverless AI due to its robustness and linear properties that allow Shapley contributions to be fully interpretable even for ensembles.
By default, the meta learner is a linear blender that assigns non-negative weights (that sum to 1) to all the base models. The weights are assigned at the model level and obtained using cross-validation (to avoid overfitting of the meta learner). When making prediction on a test set, the predictions from all cross-validation models are averaged. For example, if 2 models are ensembled together (e.g., a LightGBM model and an XGBoost model, each doing 4-fold cross validation), then the linear blender will find a weight for all 4 LightGBM models (e.g., 0.37) and a weight for all 4 XGBoost models (e.g., 0.63). All cross-validation models are averaged, so for 4-fold CV, each XGBoost fold model has an effective weight of 0.63/4. When Driverless AI ensembles a single model (level 1), then it is simply taking the average of the CV model predictions (the model itself is assigned a weight of 1).
Driverless AI has multiple ensemble levels that are tied to the accuracy knob. As accuracy increases, the ensemble level increases.
Ensemble level can also be controlled using Ensemble Level for Final Modeling Pipeline from the Model settings of the expert settings panel. The following is a description of each ensemble level:
level 0: No ensemble, only a final single model. Cross validation is only used to determine the model validation performance. The final model is trained on the whole dataset.
level 1: Cross validation is performed for 1 model and the CV model predictions are ensembled.
level 2: Cross validation is performed for 2 models and the CV model predictions are ensembled. For example, Driverless AI may choose to ensemble an XGBoost model and a LightGBM model. The ensembling is done by blending the predictions from the cross validation XGBoost models and cross validation Light GBM models. If Driverless AI has decided on 5-fold cross validation, then 10 models will be ensembled (5 CV models from the XGBoost model and 5 CV models from the LightGBM model).
level 3: Same as level 2 but with 3 models.
level 4: Same as level 2 but with 4 models.
A description of the ensemble for your final model is available in the experiment log under Ensemble Base Model Fold Scores.
You can set the ensemble level manually in the Expert Settings panel with the Ensemble Level for Final Modeling Pipeline setting.