Experiment Settings

This section describes the settings that are available when running an experiment.

Display Name

Optional: Specify a display name for the new experiment. There are no character or length restrictions for naming. If this field is left blank, Driverless AI will automatically generate a name for the experiment.

Dropped Columns

Dropped columns are columns that you do not want to be used as predictors in the experiment. Note that Driverless AI will automatically drop ID columns and columns that contain a significant number of unique values (above max_relative_cardinality in the config.toml file or Max. allowed fraction of uniques for integer and categorical cols in Expert settings).

Validation Dataset

The validation dataset is used for tuning the modeling pipeline. If provided, the entire training data will be used for training, and validation of the modeling pipeline is performed with only this validation dataset. This is not generally recommended, but can make sense if the data are non-stationary. In such a case, the validation dataset can help to improve the generalization performance on shifting data distributions.

This dataset must have the same number of columns (and column types) as the training dataset. Also note that if provided, the validation set is not sampled down, so it can lead to large memory usage, even if accuracy=1 (which reduces the train size).

Test Dataset

The test dataset is used for testing the modeling pipeline and creating test predictions. The test set is never used during training of the modeling pipeline. (Results are the same whether a test set is provided or not.) If a test dataset is provided, then test set predictions will be available at the end of the experiment.

Weight Column

Optional: Column that indicates the observation weight (a.k.a. sample or row weight), if applicable. This column must be numeric with values >= 0. Rows with higher weights have higher importance. The weight affects model training through a weighted loss function and affects model scoring through weighted metrics. The weight column is not used when making test set predictions, but a weight column (if specified) is used when computing the test score.

Fold Column

Optional: Column to use to create stratification folds during (cross-)validation, if applicable. Must be of integer or categorical type. Rows with the same value in the fold column represent cohorts, and each cohort is assigned to exactly one fold. This can help to build better models when the data is grouped naturally. If left empty, the data is assumed to be i.i.d. (identically and independently distributed). For example, when viewing data for a pneumonia dataset, person_id would be a good Fold Column. This is because the data may include multiple diagnostic snapshots per person, and we want to ensure that the same person’s characteristics show up only in either the training or validation frames, but not in both to avoid data leakage. Note that a fold column cannot be specified if a validation set is used or if a Time Column is specified.

Time Column

Optional: Specify a column that provides a time order (time stamps for observations), if applicable. This can improve model performance and model validation accuracy for problems where the target values are auto-correlated with respect to the ordering (per time-series group).

The values in this column must be a datetime format understood by pandas.to_datetime(), like “2017-11-29 00:30:35” or “2017/11/29”, or integer values. If [AUTO] is selected, all string columns are tested for potential date/datetime content and considered as potential time columns. If a time column is found, feature engineering and model validation will respect the causality of time. If [OFF] is selected, no time order is used for modeling and data may be shuffled randomly (any potential temporal causality will be ignored).

When your data has a date column, then in most cases, specifying [AUTO] for the Time Column will be sufficient. However, if you select a specific date column, then Driverless AI will provide you with an additional side menu. From this side menu, you can specify Time Group columns or specify [Auto] to let Driverless AI determine the best time group columns. You can also specify the Forecaset Horizon in weeks and the Gap between the train and test periods.

Refer to Time Series in Driverless AI for more information about time series experiments in Driverless AI and to see a time series example.

Time column settings


  • Engineered features will be used for MLI when a time series experiment is built. This is because munged time series features are more useful features for MLI compared to raw time series features.
  • A Time Column cannot be specified if a Fold Column is specified. This is because both fold and time columns are only used to split training datasets into training/validation, so once you split by time, you cannot also split with the fold column. If a Time Column is specified, then the time group columns play the role of the fold column for time series.
  • A Time Column cannot be specified if a validation dataset is used.

Accuracy, Time, and Interpretability Knobs

The experiment preview describes what the Accuracy, Time, and Interpretability settings mean for your specific experiment. This preview will autmatically update if any of the knob values change. The following is more detailed information describing how these values affect an experiment.

Experiment preview


As accuracy increases (as indicated by the tournament_* toml settings), Driverless AI gradually adjusts the method for performing the evolution and ensemble. At low accuracy, Driverless AI varies features and models, but they all compete evenly against each other. At higher accuracy, each independent main model will evolve independently and be part of the final ensemble as an ensemble over different main models. At higher accuracies, Driverless AI will evolve+ensemble feature types like Target Encoding on and off that evolve independently. Finally, at highest accuracies, Driverless AI performs both model and feature tracking and ensembles all those variations.

The following table describes how the Accuracy value affects a Driverless AI experiment.

Accuracy Max Rows x Cols Ensemble Level Target Transformation Parameter Tuning Level Num Individuals Num Folds Only First Fold Model Distribution Check
1 100K 0 False 0 Auto 3 True No
2 1M 0 False 0 Auto 3 True No
3 50M 0 True 1 Auto 3 True No
4 100M 0 True 1 Auto 3-4 True No
5 200M 1 True 1 Auto 3-4 True Yes
6 500M 2 True 1 Auto 3-5 True Yes
7 750M <=3 True 2 Auto 3-10 Auto Yes
8 1B <=3 True 2 Auto 4-10 Auto Yes
9 2B <=3 True 3 Auto 4-10 Auto Yes
10 10B <=4 True 3 Auto 4-10 Auto Yes

Note: A check for a shift in the distribution between train and test is done for accuracy >= 5.

The list below includes more information about the parameters that are used when calculating accuracy.

  • Max Rows x Cols: The maximum number of rows x colums to use in model training

    • For classification, stratified random row sampling is performed (by target)
    • For regression, random row sampling is performed
  • Ensemble Level: The level of ensembling done for the final model (if no time column is selected)

    • 0: single model
    • 1: 1x 4-fold models ensembled together
    • 2: 2x 5-fold models ensembled together
    • 3: 5x 5-fold models ensembled together
    • 4: 8x 5-fold models ensembled together
    • If ensemble level > 0, then the final model score shows an error estimate that includes the data generalization error (standard deviation of scores over folds) and the error in the estimate of the score (bootstrap score’s standard deviation with sample size same as data size).
    • For accuracy >= 8, the estimate of the error in the validation score reduces, and the error in the score is dominated by the data generalization error.
    • The estimate of the error in the test score is estimated by the maximum of the bootstrap with sample size equal to the test set size and the validation score’s error.
  • Target Transformation: Try target transformations and choose the transformation(s) that have the best score(s).

    Possible transformations: identity, unit_box, log, square, square root, double square root, inverse, Anscombe, logit, sigmoid

  • Parameter Tuning Level: The level of parameter tuning done

    • 0: no parameter tuning
    • 1: 8 different parameter settings
    • 2: 16 different parameter settings
    • 3: 32 different parameter settings
    • 4: 64 different parameter settings
    • Optimal model parameters are chosen based on a combination of the model’s accuracy, training speed, and complexity.
  • Num Individuals: The number of individuals in the population for the genetic algorithms

    • Each individual is a gene. The more genes, the more combinations of features are tried.
    • The number of individuals is automatically determined and can depend on the number of GPUs. Typical values are between 4 and 16.
  • Num Folds: The number of internal validation splits done for each pipeline

    • If the problem is a classification problem, then stratified folds are created.
  • Only First Fold Model: Whether to only use the first fold split for internal validation to save time

    • Example: Setting Num Folds to 3 and Only First Fold Model = True means you are splitting the data into 67% training and 33% validation.
    • If “Only First Fold Model” is False, then errors on the score shown during feature engineering include the data generalization error (standard deviation of scores over folds) and the error in the estimate of the score (bootstrap score’s standard deviation with a sample size the same as the data size).
    • If “Only First Fold Model” is True, then errors on the score shown during feature engineering include only the error in the estimate of the score (bootstrap score’s standard deviation with a sample size same as the data size).
    • For accuracy >= 8, the estimate of the error in the score reduces, and the error in the score is dominated by the data generalization error. This provides the most accurate generalization error.
  • Early Stopping Rounds: Time-based means based upon the Time table below.

  • Distribution Check: Checks whether validation or test data are drawn from the same distribution as the training data. Note that this is purely informative to the user. Driverless AI does not take information from the test set into consideration during training.

  • Strategy: Feature selection strategy (to prune-away features that do not clearly give improvement to model score). Feature selection is triggered by interpretability. Strategy = “FS” if interpretability >= 6; otherwise strategy is None.


This specifies the relative time for completing the experiment (i.e., higher settings take longer). Early stopping will take place if the experiment doesn’t improve the score for the specified amount of iterations.

Time Iterations Early Stopping Rounds
1 1-5 None
2 10 5
3 30 5
4 40 5
5 50 10
6 100 10
7 150 15
8 200 20
9 300 30
10 500 50

Note: See the Accuracy table for cases when not based upon time.


In the following tables, Ensemble Level is the level of ensembling done for the final model (if no time column is selected).

  • 0: single model
  • 1: 1x 4-fold models ensembled together
  • 2: 2x 5-fold models ensembled together
  • 3: 5x 5-fold models ensembled together

If Monotonicity Constraints are enabled, the model will satisfy knowledge about monotonicity in the data and monotone relationships between the predictors and the target variable. For example, in house price prediction, the house price should increase with lot size and number of rooms, and should decrease with crime rate in the area. If enabled, Driverless AI will automatically determine if monotonicity is present and enforce it in its modeling pipelines. Depending on the correlation, Driverless AI will assign positive, negative, or no monotonicity constraints. Monotonicity is enforced if the absolute correlation is greater than 0.1. All other predictors will not have monotonicity enforced.

Interpretability Ensemble Level Monotonicity Constraints
<= 5 <= 3 Disabled
>= 6 <= 2 Disabled
>= 7 <= 2 Enabled
>= 8 <= 1 Enabled
10 0 Enabled
Interpretability Transformers**
<= 5 All
0-5 Interpretability#5 - [TruncSvdNum, ClusterDist]
0-6 Interpretability#6 - [ClusterTE, ClusterID, IsolationForestAnomaly]
0-7 Interpretability#7 - [NumToCatTE]
0-8 Interpretability#8 - [NumCatTE, NumToCatWoE]
0-9 Interpretability#9 - [BulkInteractions, WeightOfEvidence, CvCatNumEncode, NumToCatWeightOfEvidenceMonotonic]
0-10 Interpretability#10 - [CVTargetEncodeFit, CVCatNumericEncodeF, Frequent]

** Interpretability# - [lost transformers] explains which transformers are lost by going up by 1 to that interpretability.

** Exception - NumToCatWeightOfEvidenceMonotonic removed for interpretability<=6.

** For interpretability <= 10, i.e. only [Filter for numeric, Frequent for categorical, DateTime for Date+Time, Date for dates, and Text for text]

  • Target Transformers:

    For regression, applied on target before any other transformations.

    Interpretability Target Transformer
    <=10 TargetTransformer_identity
    <=10 TargetTransformer_unit_box
    <=10 TargetTransformer_log
    <= 9 TargetTransformer_square
    <= 9 TargetTransformer_sqrt
    <= 8 TargetTransformer_double_sqrt
    <= 6 TargetTransformer_logit
    <= 6 TargetTransformer_sigmoid
    <= 5 TargetTransformer_Anscombe
    <= 4 TargetTransformer_inverse
  • Date Types Detected:

    • categorical
    • date
    • datetime
    • numeric
    • text
  • Transformers used on raw features to generate new features:

    Interpretability Transformer
    <=10 Filter
    <=10 DateTime
    <=10 Date
    <=10 Text
    <=10 TextLin
    <=10 CvTargetEncodeMulti
    <=10 CvTargetEncodeSingle
    <=9 CvCatNumEncode
    <=9 WeightOfEvidence
    <=9 and >=7 NumToCatWeightOfEvidenceMonotonic
    <=9 BulkInteractions
    <=8 NumToCatWeightOfEvidence
    <=8 NumCatTargetEncodeMulti
    <=8 NumCatTargetEncodeSingle
    <=7 Frequent
    <=7 NumToCatTargetEncodeMulti
    <=7 NumToCatTargetEncodeSingle
    <=6 ClusterIDTargetEncodeMulti
    <=6 ClusterIDTargetEncodeSingle
    <=5 TruncSvdNum
    <=5 ClusterDist
    ** Default N-way interactions are up to 8-way except:
    • BulkInteractions are always 2-way.
    • Interactions are minimal-way (e.g. 1-way for CvTargetEncode) if interpretability=10.
  • Feature importance threshold below which features are removed

    Interpretability Threshold
    10 config.toml varimp_threshold_at_interpretability_10
    9 varimp_threshold_at_interpretability_10/5.0
    8 varimp_threshold_at_interpretability_10/7.0
    7 varimp_threshold_at_interpretability_10/10.0
    6 varimp_threshold_at_interpretability_10/20.0
    5 varimp_threshold_at_interpretability_10/30.0
    4 varimp_threshold_at_interpretability_10/50.0
    3 varimp_threshold_at_interpretability_10/500.0
    2 varimp_threshold_at_interpretability_10/5000.0
    1 1E-30

    ** Also used for strategy=FS dropping of features, but the threshold is the above value multiplied by config.varimp_fspermute_factor.

  • Base model used for scoring features and building final model

    Interpretability Allowed Base Model
    10 Only GLM if glm_enable_more==True or glm_enable_exlcusive=True, GBM+GLM if glm_enable==True, else only GBM
    9 GBM unless glm_enable_exlcusive=True, GBM+GLM if glm_enable_more==True
    8 GBM unless glm_enable_exlcusive=True, GBM+GLM if glm_enable_more==True
    7 GBM unless glm_enable_exlcusive=True, GBM+GLM if glm_enable_more==True
    6 GBM unless glm_enable_exlcusive=True, GBM+GLM if glm_enable_more==True
    5 GBM unless glm_enable_exlcusive=True
    4 GBM unless glm_enable_exlcusive=True
    3 GBM unless glm_enable_exlcusive=True
    2 GBM unless glm_enable_exlcusive=True
    1 GBM unless glm_enable_exlcusive=True

    ** When mixing GBM and GLM in parameter tuning, the search space is split 50%/50% between GBM and GLM.

Classification, Reproducible, and Enable GPUs Buttons

  • Classification or Regression button. Driverless AI automatically determines the problem type based on the response column. Though not recommended, you can override this setting by clicking this button.
  • Reproducible: This button allows you to build an experiment with a random seed and get reproducible results. If this is disabled (default), then results will vary between runs.
  • Enable GPUs: Specify whether to enable GPUs. (Note that this option is ignored on CPU-only systems.)