Expert configuration
time_abort_format
Time string format for time_abort. (String) (Expert Setting)
Default value '%Y-%m-%d %H:%M:%S'
Any format is allowed as accepted by datetime.strptime.
time_abort_timezone
Time zone for time_abort. (String) (Expert Setting)
Default value 'UTC'
Any time zone in format accepted by datetime.strptime.
inject_mojo_for_predictions
inject_mojo_for_predictions (Boolean) (Expert Setting)
Default value True
Inject MOJO into fitted Python state if mini acceptance test passes, so can use C++ MOJO runtime when calling predict(enable_mojo=True, IS_SCORER=True, …). Prerequisite for mojo_for_predictions=’on’ or ‘auto’.
mojo_acceptance_test_rtol
Relative tolerance for mini MOJO acceptance test. (Float) (Expert Setting)
Default value 0.0
Relative tolerance for mini MOJO acceptance test. If Python/C++ MOJO differs more than this from Python, won’t use MOJO inside Python for later scoring. Only applicable if mojo_for_predictions=True. Disabled if <= 0.
mojo_acceptance_test_atol
Absolute tolerance for mini MOJO acceptance test. (Float) (Expert Setting)
Default value 0.0
Absolute tolerance for mini MOJO acceptance test (for regression/Shapley, will be scaled by max(abs(preds)). If Python/C++ MOJO differs more than this from Python, won’t use MOJO inside Python for later scoring. Only applicable if mojo_for_predictions=True. Disabled if <= 0.
max_cols_make_autoreport_automatically
Number of columns beyond which will not automatically build autoreport at end of experiment. (Number) (Expert Setting)
Default value 1000
max_cols_make_pipeline_visualization_automatically
Number of columns beyond which will not automatically build pipeline visualization at end of experiment. (Number) (Expert Setting)
Default value 5000
transformer_description_line_length
Line length for autoreport descriptions of transformers. -1 means use autodoc_keras_summary_line_length (Number) (Expert Setting)
Default value -1
benchmark_mojo_latency_auto_size_limit
Max size of pipeline.mojo file (in MB) for when benchmark_mojo_latency is set to ‘auto’ (Number) (Expert Setting)
Default value 2048
Max size of pipeline.mojo file (in MB) for automatic mode of MOJO scoring latency measurement
max_dt_threads_do_timeseries_split_suggestion
max_dt_threads_do_timeseries_split_suggestion (Number) (Expert Setting)
Default value 1
Maximum number of threads for datatable during TS properties preview panel computations).
kaggle_keep_submission
Whether to keep Kaggle submission file in experiment directory (Boolean) (Expert Setting)
Default value False
kaggle_competitions
Custom Kaggle competitions to make automatic test set submissions for. (String) (Expert Setting)
Default value ''
If provided, can extend the list to arbitrary and potentially future Kaggle competitions to make submissions for. Only used if kaggle_key and kaggle_username are provided. Provide a quoted comma-separated list of tuples (target column name, number of test rows, competition, metric) like this: kaggle_competitions=’(“target”, 200000, “santander-customer-transaction-prediction”, “AUC”), (“TARGET”, 75818, “santander-customer-satisfaction”, “AUC”)’
ping_period
ping_period (Number) (Expert Setting)
Default value 60
Period (in seconds) of ping by Driverless AI server to each experiment (in order to get logger info like disk space and memory usage). 0 means don’t print anything.
ping_autodl
Whether to enable ping of system status during DAI experiments. (Boolean) (Expert Setting)
Default value True
Whether to enable ping of system status during DAI experiments.
stall_disk_limit_gb
stall_disk_limit_gb (Number) (Expert Setting)
Default value 1
Minimum amount of disk space in GB needed to before stall forking of new processes during an experiment.
min_rows_per_class
min_rows_per_class (Number) (Expert Setting)
Default value 5
Minimum required number of rows (in the training data) for each class label for classification problems.
min_rows_per_split
min_rows_per_split (Number) (Expert Setting)
Default value 5
Minimum required number of rows for each split when generating validation samples.
tf_nan_impute_value
tf_nan_impute_value (Number) (Expert Setting)
Default value -5
For tensorflow, what numerical value to give to missing values, where numeric values are standardized. So 0 is center of distribution, and if Normal distribution then +-5 is 5 standard deviations away from the center. In many cases, an out of bounds value is a good way to represent missings, but in some cases the mean (0) may be better.
statistical_threshold_data_size_small
statistical_threshold_data_size_small (Number) (Expert Setting)
Default value 100000
Internal threshold for number of rows x number of columns to trigger certain statistical techniques (small data recipe like including one hot encoding for all model types, and smaller learning rate) to increase model accuracy
statistical_threshold_data_size_large
statistical_threshold_data_size_large (Number) (Expert Setting)
Default value 500000000
Internal threshold for number of rows x number of columns to trigger certain statistical techniques (fewer genes created, removal of high max_depth for tree models, etc.) that can speed up modeling. Also controls maximum rows used in training final model, by sampling statistical_threshold_data_size_large / columns number of rows
aux_threshold_data_size_large
aux_threshold_data_size_large (Number) (Expert Setting)
Default value 10000000
Internal threshold for number of rows x number of columns to trigger sampling for auxiliary data uses, like imbalanced data set detection and bootstrap scoring sample size and iterations
set_method_sampling_row_limit
set_method_sampling_row_limit (Number) (Expert Setting)
Default value 5000000
Internal threshold for set-based method for sampling without replacement. Can be 10x faster than np_random_choice internal optimized method, and up to 30x faster than np.random.choice to sample 250k rows from 1B rows etc.
performance_threshold_data_size_small
performance_threshold_data_size_small (Number) (Expert Setting)
Default value 100000
Internal threshold for number of rows x number of columns to trigger certain changes in performance (fewer threads if beyond large value) to help avoid OOM or unnecessary slowdowns (fewer threads if lower than small value) to avoid excess forking of tasks
performance_threshold_data_size_large
performance_threshold_data_size_large (Number) (Expert Setting)
Default value 100000000
Internal threshold for number of rows x number of columns to trigger certain changes in performance (fewer threads if beyond large value) to help avoid OOM or unnecessary slowdowns (fewer threads if lower than small value) to avoid excess forking of tasks
gpu_default_threshold_data_size_large
gpu_default_threshold_data_size_large (Number) (Expert Setting)
Default value 1000000
Threshold for number of rows x number of columns to trigger GPU to be default for models like XGBoost GBM.
max_relative_cols_mismatch_allowed
max_relative_cols_mismatch_allowed (Float) (Expert Setting)
Default value 0.5
Maximum fraction of mismatched columns to allow between train and either valid or test. Beyond this value the experiment will fail with invalid data error.
max_rows_final_blender
max_rows_final_blender (Number) (Expert Setting)
Default value 1000000
Largest number of rows to use for final ensemble blender for regression and binary (scaled down linearly by number of classes for multiclass for >= 10 classes), otherwise sample randomly.
min_rows_final_blender
min_rows_final_blender (Number) (Expert Setting)
Default value 10000
Smallest number of rows (or number of rows if less than this) to use for final ensemble blender.
max_rows_final_train_score
max_rows_final_train_score (Number) (Expert Setting)
Default value 5000000
Largest number of rows to use for final training score (no holdout), otherwise sample randomly
max_rows_final_roccmconf
max_rows_final_roccmconf (Number) (Expert Setting)
Default value 1000000
Largest number of rows to use for final ROC, lift-gains, confusion matrix, residual, and actual vs. predicted. Otherwise sample randomly
max_rows_final_holdout_score
max_rows_final_holdout_score (Number) (Expert Setting)
Default value 5000000
Largest number of rows to use for final holdout scores, otherwise sample randomly
max_rows_final_holdout_bootstrap_score
max_rows_final_holdout_bootstrap_score (Number) (Expert Setting)
Default value 1000000
Largest number of rows to use for final holdout bootstrap scores, otherwise sample randomly
max_rows_leak
Max. rows for leakage detection if wide rules used on wide data (Number) (Expert Setting)
Default value 100000
max_workers_fs
Num. simultaneous predictions for feature selection (0 = auto) (Number) (Expert Setting)
Default value 0
How many workers to use for feature selection by permutation for predict phase. (0 = auto, > 0: min of DAI value and this value, < 0: exactly negative of this value)
max_workers_shift_leak
Num. simultaneous fits for shift and leak checks if using LightGBM on CPU (0 = auto) (Number) (Expert Setting)
Default value 0
How many workers to use for shift and leakage checks if using LightGBM on CPU. (0 = auto, > 0: min of DAI value and this value, < 0: exactly negative of this value)
max_orig_nonnumeric_cols_selected_default
max_orig_nonnumeric_cols_selected_default (Number) (Expert Setting)
Default value 300
full_cv_accuracy_switch
full_cv_accuracy_switch (Number) (Expert Setting)
Default value 9
Accuracy setting equal and above which enables full cross-validation (multiple folds) during feature evolution as opposed to only a single holdout split (e.g. 2/3 train and 1/3 validation holdout)
ensemble_accuracy_switch
ensemble_accuracy_switch (Number) (Expert Setting)
Default value 5
Accuracy setting equal and above which enables stacked ensemble as final model. Stacking commences at the end of the feature evolution process.. It quite often leads to better model performance, but it does increase the complexity and execution time of the final model.
num_ensemble_folds
num_ensemble_folds (Number) (Expert Setting)
Default value 4
Number of fold splits to use for ensemble_level >= 2. The ensemble modelling may require predictions to be made on out-of-fold samples hence the data needs to be split on different folds to generate these predictions. Less folds (like 2 or 3) normally create more stable models, but may be less accurate More folds can get to higher accuracy at the expense of more time, but the performance may be less stable when the training data is not enough (i.e. higher chance of overfitting). Actual value will vary for small or big data cases.
fold_reps
fold_reps (Number) (Expert Setting)
Default value 1
Number of repeats for each fold for all validation (modified slightly for small or big data cases)
max_num_classes_hard_limit
max_num_classes_hard_limit (Number) (Expert Setting)
Default value 10000
min_roc_sample_size
min_roc_sample_size (Number) (Expert Setting)
Default value 1
enable_strict_confict_key_check_for_brain
enable_strict_confict_key_check_for_brain (Boolean) (Expert Setting)
Default value True
allow_change_layer_count_brain
For feature brain or restart/refit, whether to allow brain ingest to use different feature engineering layer count. (Boolean) (Expert Setting)
Default value False
brain_maximum_diff_score
brain_maximum_diff_score (Float) (Expert Setting)
Default value 0.1
Relative number of columns that must match between current reference individual and brain individual. 0.0: perfect match 1.0: All columns are different, worst match e.g. 0.1 implies no more than 10% of columns mismatch between reference set of columns and brain individual.
brain_max_size_GB
brain_max_size_GB (Number) (Expert Setting)
Default value 20
Maximum size in bytes the brain will store We reserve this memory to save data in order to ensure we can retrieve an experiment if for any reason it gets interrupted. -1: unlimited >=0 number of GB to limit brain to
early_stopping
early_stopping (Boolean) (Expert Setting)
Default value True
Whether to enable early stopping Early stopping refers to stopping the feature evolution/engineering process when there is no performance uplift after a certain number of iterations. After early stopping has been triggered, Driverless AI will initiate the ensemble process if selected.
early_stopping_per_individual
early_stopping_per_individual (Boolean) (Expert Setting)
Default value True
Whether to enable early stopping per individual Each individual in the generic algorithm will stop early if no improvement, and it will no longer be mutated. Instead, the best individual will be additionally mutated.
text_dominated_limit_tuning
text_dominated_limit_tuning (Boolean) (Expert Setting)
Default value True
Whether to reduce options for text-dominated models to reduce expense, e.g. disable ensemble, disable genetic algorithm, single identity target encoder for classification, etc.
image_dominated_limit_tuning
image_dominated_limit_tuning (Boolean) (Expert Setting)
Default value True
Whether to reduce options for image-dominated models to reduce expense, e.g. disable ensemble, disable genetic algorithm, single identity target encoder for classification, etc.
supported_image_types
supported_image_types (List) (Expert Setting)
Default value ['jpg', 'jpeg', 'png', 'bmp', 'ppm', 'tif', 'tiff', 'JPG', 'JPEG', 'PNG', 'BMP', 'PPM', 'TIF', 'TIFF']
Supported image types. URIs with these endings will be considered as image paths (local or remote).
image_paths_absolute
image_paths_absolute (Boolean) (Expert Setting)
Default value False
Whether to create absolute paths for images when importing datasets containing images. Can faciliate testing or re-use of frames for scoring.
text_dl_token_pad_percentile
text_dl_token_pad_percentile (Number) (Expert Setting)
Default value 99
Percentile value cutoff of input text token lengths for nlp deep learning models
text_dl_token_pad_max
text_dl_token_pad_max (Number) (Expert Setting)
Default value 512
Maximum token length of input text to be used in nlp deep learning models
tune_parameters_accuracy_switch
tune_parameters_accuracy_switch (Number) (Expert Setting)
Default value 3
Accuracy setting equal and above which enables tuning of model parameters Only applicable if parameter_tuning_num_models=-1 (auto)
tune_target_transform_accuracy_switch
tune_target_transform_accuracy_switch (Number) (Expert Setting)
Default value 5
Accuracy setting equal and above which enables tuning of target transform for regression. This is useful for time series when instead of predicting the actual target value, it might be better to predict a transformed target variable like sqrt(target) or log(target) as a means to control for outliers.
tournament_uniform_style_interpretability_switch
tournament_uniform_style_interpretability_switch (Number) (Expert Setting)
Default value 8
Interpretability above which will use ‘uniform’ tournament style
tournament_uniform_style_accuracy_switch
tournament_uniform_style_accuracy_switch (Number) (Expert Setting)
Default value 6
Accuracy below which will use uniform style if tournament_style = ‘auto’ (regardless of other accuracy tournament style switch values)
tournament_model_style_accuracy_switch
tournament_model_style_accuracy_switch (Number) (Expert Setting)
Default value 6
Accuracy equal and above which uses model style if tournament_style = ‘auto’
tournament_feature_style_accuracy_switch
tournament_feature_style_accuracy_switch (Number) (Expert Setting)
Default value 13
Accuracy equal and above which uses feature style if tournament_style = ‘auto’
tournament_fullstack_style_accuracy_switch
tournament_fullstack_style_accuracy_switch (Number) (Expert Setting)
Default value 13
Accuracy equal and above which uses fullstack style if tournament_style = ‘auto’
tournament_use_feature_penalized_score
tournament_use_feature_penalized_score (Boolean) (Expert Setting)
Default value True
Whether to use penalized score for GA tournament or actual score
tournament_keep_poor_scores_for_small_data
tournament_keep_poor_scores_for_small_data (Boolean) (Expert Setting)
Default value True
- Whether to keep poor scores for small data (<10k rows) in case exploration will find good model.
sets tournament_remove_poor_scores_before_evolution_model_factor=1.1 tournament_remove_worse_than_constant_before_evolution=false tournament_keep_absolute_ok_scores_before_evolution_model_factor=1.1 tournament_remove_poor_scores_before_final_model_factor=1.1 tournament_remove_worse_than_constant_before_final_model=true
tournament_remove_poor_scores_before_evolution_model_factor
tournament_remove_poor_scores_before_evolution_model_factor (Float) (Expert Setting)
Default value 0.7
- Factor (compared to best score plus each score) beyond which to drop poorly scoring models before evolution.
This is useful in cases when poorly scoring models take a long time to train.
tournament_remove_worse_than_constant_before_evolution
tournament_remove_worse_than_constant_before_evolution (Boolean) (Expert Setting)
Default value True
For before evolution after tuning, whether to remove models that are worse than (optimized to scorer) constant prediction model
tournament_keep_absolute_ok_scores_before_evolution_model_factor
tournament_keep_absolute_ok_scores_before_evolution_model_factor (Float) (Expert Setting)
Default value 0.2
For before evolution after tuning, where on scale of 0 (perfect) to 1 (constant model) to keep ok scores by absolute value.
tournament_remove_poor_scores_before_final_model_factor
tournament_remove_poor_scores_before_final_model_factor (Float) (Expert Setting)
Default value 0.3
Factor (compared to best score) beyond which to drop poorly scoring models before building final ensemble. This is useful in cases when poorly scoring models take a long time to train.
tournament_remove_worse_than_constant_before_final_model
tournament_remove_worse_than_constant_before_final_model (Boolean) (Expert Setting)
Default value True
For before final model after evolution, whether to remove models that are worse than (optimized to scorer) constant prediction model
num_individuals
num_individuals (Number) (Expert Setting)
Default value 2
Driverless AI uses a genetic algorithm (GA) to find the best features, best models and best hyper parameters for these models. The GA facilitates getting good results while not requiring torun/try every possible model/feature/parameter. This version of GA has reinforcement learning elements - it uses a form of exploration-exploitation to reach optimum solutions. This means it will capitalise on models/features/parameters that seem # to be working well and continue to exploit them even more, while allowing some room for trying new (and semi-random) models/features/parameters to avoid settling on a local minimum. These models/features/parameters tried are what-we-call individuals of a population. More # individuals connote more models/features/parameters to be tried and compete to find the best # ones.
cv_in_cv_overconfidence_protection_factor
cv_in_cv_overconfidence_protection_factor (Float) (Expert Setting)
Default value 3.0
excluded_transformers
Exclude specific transformers (List) (Expert Setting)
Default value []
- Auxiliary to included_transformers
e.g. to disable all Target Encoding: excluded_transformers = ‘[‘NumCatTETransformer’, ‘CVTargetEncodeF’, ‘NumToCatTETransformer’, ‘ClusterTETransformer’]’. Does not affect transformers used for preprocessing with included_pretransformers.
excluded_genes
Exclude specific genes (List) (Expert Setting)
Default value []
Exclude list of genes (i.e. genes (built on top of transformers) to not use, independent of the interpretability setting) Some transformers are used by multiple genes, so this allows different control over feature engineering
for multi-class: ‘[‘InteractionsGene’, ‘WeightOfEvidenceGene’, ‘NumToCatTargetEncodeSingleGene’, ‘OriginalGene’, ‘TextGene’, ‘FrequentGene’, ‘NumToCatWeightOfEvidenceGene’, ‘NumToCatWeightOfEvidenceMonotonicGene’, ‘ CvTargetEncodeSingleGene’, ‘DateGene’, ‘NumToCatTargetEncodeMultiGene’, ‘ DateTimeGene’, ‘TextLinRegressorGene’, ‘ClusterIDTargetEncodeSingleGene’, ‘CvCatNumEncodeGene’, ‘TruncSvdNumGene’, ‘ClusterIDTargetEncodeMultiGene’, ‘NumCatTargetEncodeMultiGene’, ‘CvTargetEncodeMultiGene’, ‘TextLinClassifierGene’, ‘NumCatTargetEncodeSingleGene’, ‘ClusterDistGene’]’
for regression/binary: ‘[‘CvTargetEncodeSingleGene’, ‘NumToCatTargetEncodeSingleGene’, ‘CvCatNumEncodeGene’, ‘ClusterIDTargetEncodeSingleGene’, ‘TextLinRegressorGene’, ‘CvTargetEncodeMultiGene’, ‘ClusterDistGene’, ‘OriginalGene’, ‘DateGene’, ‘ClusterIDTargetEncodeMultiGene’, ‘NumToCatTargetEncodeMultiGene’, ‘NumCatTargetEncodeMultiGene’, ‘TextLinClassifierGene’, ‘WeightOfEvidenceGene’, ‘FrequentGene’, ‘TruncSvdNumGene’, ‘InteractionsGene’, ‘TextGene’, ‘DateTimeGene’, ‘NumToCatWeightOfEvidenceGene’, ‘NumToCatWeightOfEvidenceMonotonicGene’, ‘’NumCatTargetEncodeSingleGene’]’
This list appears in the experiment logs (search for ‘Genes used’) e.g. to disable interaction gene, use: excluded_genes = ‘[‘InteractionsGene’]’. Does not affect transformers used for preprocessing with included_pretransformers.
excluded_models
Exclude specific models (List) (Expert Setting)
Default value []
Auxiliary to included_models
excluded_pretransformers
Exclude specific pretransformers (List) (Expert Setting)
Default value []
Auxiliary to included_pretransformers
excluded_datas
Exclude specific data recipes (List) (Expert Setting)
Default value []
Auxiliary to included_datas
excluded_individuals
Exclude specific individual recipes (List) (Expert Setting)
Default value []
Auxiliary to included_individuals
excluded_scorers
Exclude specific scorers (List) (Expert Setting)
Default value []
Auxiliary to included_scorers
use_dask_for_1_gpu
use_dask_for_1_gpu (Boolean) (Expert Setting)
Default value False
Whether to use dask_cudf even for 1 GPU. If False, will use plain cudf.
optuna_pruner_kwargs
Set Optuna pruner constructor args. (Dict) (Expert Setting)
Default value {'n_startup_trials': 5, 'n_warmup_steps': 20, 'interval_steps': 20, 'percentile': 25.0, 'min_resource': 'auto', 'max_resource': 'auto', 'reduction_factor': 4, 'min_early_stopping_rate': 0, 'n_brackets': 4, 'min_early_stopping_rate_low': 0, 'upper': 1.0, 'lower': 0.0}
Set Optuna constructor arguments for particular applicable pruners. https://optuna.readthedocs.io/en/stable/reference/pruners.html
optuna_sampler_kwargs
Set Optuna sampler constructor args. (Dict) (Expert Setting)
Default value {}
Set Optuna constructor arguments for particular applicable samplers. https://optuna.readthedocs.io/en/stable/reference/samplers.html
drop_constant_model_final_ensemble
drop_constant_model_final_ensemble (Boolean) (Expert Setting)
Default value True
xgboost_rf_exact_threshold_num_rows_x_cols
xgboost_rf_exact_threshold_num_rows_x_cols (Number) (Expert Setting)
Default value 10000
lossguide_drop_factor
Factor by which to drop max_leaves from effective max_depth value when doing loss_guide. E.g. if max_depth is normally 12, this makes leaves 2**11 not 2**12 (Float) (Expert Setting)
Default value 4.0
lossguide_max_depth_extend_factor
Factor by which to extend max_depth mutations when doing loss_guide. E.g. if max_leaves ends up as x let max_depth be factor * x. (Float) (Expert Setting)
Default value 8.0
params_tune_grow_policy_simple_trees
params_tune_grow_policy_simple_trees (Boolean) (Expert Setting)
Default value True
Whether to force max_leaves and max_depth to be 0 if grow_policy is depthwise and lossguide, respectively.
max_epochs_tf_big_data
max_epochs_tf_big_data (Number) (Expert Setting)
Default value 5
Number of epochs for TensorFlow when larger data size.
default_max_bin
default_max_bin (Number) (Expert Setting)
Default value 256
Default max_bin for tree methods
default_lightgbm_max_bin
default_lightgbm_max_bin (Number) (Expert Setting)
Default value 249
Default max_bin for LightGBM (64 recommended for GPU LightGBM for speed)
min_max_bin
min_max_bin (Number) (Expert Setting)
Default value 32
Minimum max_bin for any tree
tensorflow_use_all_cores
tensorflow_use_all_cores (Boolean) (Expert Setting)
Default value True
Whether TensorFlow will use all CPU cores, or if it will split among all transformers. Only for transformers, not TensorFlow model.
tensorflow_use_all_cores_even_if_reproducible_true
tensorflow_use_all_cores_even_if_reproducible_true (Boolean) (Expert Setting)
Default value False
Whether TensorFlow will use all CPU cores if reproducible is set, or if it will split among all transformers
tensorflow_disable_memory_optimization
tensorflow_disable_memory_optimization (Boolean) (Expert Setting)
Default value True
Whether to disable TensorFlow memory optimizations. Can help fix tensorflow.python.framework.errors_impl.AlreadyExistsError
tensorflow_cores
tensorflow_cores (Number) (Expert Setting)
Default value 0
How many cores to use for each TensorFlow model, regardless if GPU or CPU based (0 = auto mode)
tensorflow_model_max_cores
tensorflow_model_max_cores (Number) (Expert Setting)
Default value 4
For TensorFlow models, maximum number of cores to use if tensorflow_cores=0 (auto mode), because TensorFlow model is inefficient at using many cores. See also max_fit_cores for all models.
bert_cores
bert_cores (Number) (Expert Setting)
Default value 0
How many cores to use for each Bert Model and Transformer, regardless if GPU or CPU based (0 = auto mode)
bert_use_all_cores
bert_use_all_cores (Boolean) (Expert Setting)
Default value True
Whether Bert will use all CPU cores, or if it will split among all transformers. Only for transformers, not Bert model.
bert_model_max_cores
bert_model_max_cores (Number) (Expert Setting)
Default value 8
For Bert models, maximum number of cores to use if bert_cores=0 (auto mode), because Bert model is inefficient at using many cores. See also max_fit_cores for all models.
one_hot_encoding_show_actual_levels_in_features
Whether to show real levels in One Hot Encoding feature names. Leads to feature aggregation problems when switch between binning and not binning in fold splits. Feature description will still contain levels in each bin if True or False. (Boolean) (Expert Setting)
Default value False
validate_meta_learner
Enable basic logging and notifications for ensemble meta learner (Boolean) (Expert Setting)
Default value True
validate_meta_learner_extra
Enable extra logging for ensemble meta learner: ensemble must be at least as good as each base model (Boolean) (Expert Setting)
Default value False
num_fold_ids_show
Maximum number of fold IDs to show in logs (Number) (Expert Setting)
Default value 10
fold_scores_instability_warning_threshold
Declare positive fold scores as unstable if stddev / mean is larger than this value (Float) (Expert Setting)
Default value 0.25
imbalance_ratio_multiclass_threshold
Ratio of most frequent to least frequent class for imbalanced multiclass classification problems equal and above which to trigger special handling due to class imbalance (Number) (Expert Setting)
Default value 5
Special handling can include special models, special scorers, special feature engineering.
heavy_imbalance_ratio_multiclass_threshold
Ratio of most frequent to least frequent class for imbalanced multiclass classification problems equal and above which to trigger special handling due to heavy class imbalance (Number) (Expert Setting)
Default value 25
Special handling can include special models, special scorers, special feature engineering.
imbalance_sampling_rank_averaging
Whether to do rank averaging bagged models inside of imbalanced models, instead of probability averaging (String) (Expert Setting)
Default value 'auto'
- Rank averaging can be helpful when ensembling diverse models when ranking metrics like AUC/Gini
metrics are optimized. No MOJO support yet.
imbalance_ratio_notification_threshold
imbalance_ratio_notification_threshold (Float) (Expert Setting)
Default value 2.0
For binary classification: ratio of majority to minority class equal and above which to notify
of imbalance in GUI to say slightly imbalanced.
More than imbalance_ratio_sampling_threshold
will say problem is imbalanced.
nbins_ftrl_list
nbins_ftrl_list (List) (Expert Setting)
Default value [1000000, 10000000, 100000000]
List of possible bins for FTRL (largest is default best value)
te_bin_list
te_bin_list (List) (Expert Setting)
Default value [25, 10, 100, 250]
List of possible bins for target encoding (first is default value)
woe_bin_list
woe_bin_list (List) (Expert Setting)
Default value [25, 10, 100, 250]
List of possible bins for weight of evidence encoding (first is default value) If only want one value: woe_bin_list = [2]
ohe_bin_list
ohe_bin_list (List) (Expert Setting)
Default value [10, 25, 50, 75, 100]
List of possible bins for ohe hot encoding (first is default value). If left as default, the actual list is changed for given data size and dials.
binner_bin_list
binner_bin_list (List) (Expert Setting)
Default value [5, 10, 20]
List of max possible number of bins for numeric binning (first is default value). If left as default, the actual list is changed for given data size and dials. The binner will automatically reduce the number of bins based on predictive power.
drop_duplicate_rows_timeout
Timeout in seconds for dropping duplicate rows in training data, propportionally increases as rows*cols grows as compared to detect_duplicate_rows_max_rows_x_cols. (Number) (Expert Setting)
Default value 60
shift_check_text
shift_check_text (Boolean) (Expert Setting)
Default value False
Whether to enable checking text for shift, currently only via label encoding.
use_rf_for_shift_if_have_lgbm
use_rf_for_shift_if_have_lgbm (Boolean) (Expert Setting)
Default value True
Whether to use LightGBM random forest mode without early stopping for shift detection.
shift_key_features_varimp
shift_key_features_varimp (Float) (Expert Setting)
Default value 0.01
Normalized training variable importance above which to check the feature for shift Useful to avoid checking likely unimportant features
shift_check_reduced_features
shift_check_reduced_features (Boolean) (Expert Setting)
Default value True
Whether to only check certain features based upon the value of shift_key_features_varimp
shift_trees
shift_trees (Number) (Expert Setting)
Default value 100
Number of trees to use to train model to check shift in distribution No larger than max_nestimators
shift_max_bin
shift_max_bin (Number) (Expert Setting)
Default value 256
The value of max_bin to use for trees to use to train model to check shift in distribution
shift_min_max_depth
shift_min_max_depth (Number) (Expert Setting)
Default value 4
The min. value of max_depth to use for trees to use to train model to check shift in distribution
shift_max_max_depth
shift_max_max_depth (Number) (Expert Setting)
Default value 8
The max. value of max_depth to use for trees to use to train model to check shift in distribution
detect_features_distribution_shift_threshold_auc
detect_features_distribution_shift_threshold_auc (Float) (Expert Setting)
Default value 0.55
If distribution shift detection is enabled, show features for which shift AUC is above this value (AUC of a binary classifier that predicts whether given feature value belongs to train or test data)
leakage_check_text
leakage_check_text (Boolean) (Expert Setting)
Default value True
Whether to enable checking text for leakage, currently only via label encoding.
leakage_key_features_varimp
leakage_key_features_varimp (Float) (Expert Setting)
Default value 0.001
Normalized training variable importance (per 1 minus AUC/R2 to control for leaky varimp dominance) above which to check the feature for leakage Useful to avoid checking likely unimportant features
leakage_check_reduced_features
leakage_check_reduced_features (Boolean) (Expert Setting)
Default value True
Whether to only check certain features based upon the value of leakage_key_features_varimp. If any feature has AUC near 1, will consume all variable importance, even if another feature is also leaky. So False is safest option, but True generally good if many columns.
use_rf_for_leakage_if_have_lgbm
use_rf_for_leakage_if_have_lgbm (Boolean) (Expert Setting)
Default value True
Whether to use LightGBM random forest mode without early stopping for leakage detection.
leakage_trees
leakage_trees (Number) (Expert Setting)
Default value 100
Number of trees to use to train model to check for leakage No larger than max_nestimators
leakage_max_bin
leakage_max_bin (Number) (Expert Setting)
Default value 256
The value of max_bin to use for trees to use to train model to check for leakage
leakage_min_max_depth
leakage_min_max_depth (Number) (Expert Setting)
Default value 6
The value of max_depth to use for trees to use to train model to check for leakage
leakage_max_max_depth
leakage_max_max_depth (Number) (Expert Setting)
Default value 8
The value of max_depth to use for trees to use to train model to check for leakage
leakage_train_test_split
leakage_train_test_split (Float) (Expert Setting)
Default value 0.25
Ratio of train to validation holdout when testing for leakage
check_system_basic
Whether to report basic system information on server startup (Boolean) (Expert Setting)
Default value True
abs_tol_for_perfect_score
abs_tol_for_perfect_score (Float) (Expert Setting)
Default value 0.0001
How close to the optimal value (usually 1 or 0) does the validation score need to be to be considered perfect (to stop the experiment)?
data_ingest_timeout
data_ingest_timeout (Float) (Expert Setting)
Default value 86400.0
Timeout in seconds to wait for data ingestion.
debug_daimodel_level
debug_daimodel_level (Number) (Expert Setting)
Default value 0
log_predict_info
Whether to show detailed predict information in logs. (Boolean) (Expert Setting)
Default value True
log_fit_info
Whether to show detailed fit information in logs. (Boolean) (Expert Setting)
Default value True
show_inapplicable_models_preview
show_inapplicable_models_preview (Boolean) (Expert Setting)
Default value False
Show inapplicable models in preview, to be sure not missing models one could have used
show_inapplicable_transformers_preview
show_inapplicable_transformers_preview (Boolean) (Expert Setting)
Default value False
Show inapplicable transformers in preview, to be sure not missing transformers one could have used
show_warnings_preview
show_warnings_preview (Boolean) (Expert Setting)
Default value False
Show warnings for models (image auto, Dask multinode/multi-GPU) if conditions are met to use but not chosen to avoid missing models that could benefit accuracy/performance
show_warnings_preview_unused_map_features
show_warnings_preview_unused_map_features (Boolean) (Expert Setting)
Default value True
Show warnings for models that have no transformers for certain features.
max_cols_show_unused_features
max_cols_show_unused_features (Number) (Expert Setting)
Default value 1000
Up to how many input features to determine, during GUI/client preview, unused features. Too many slows preview down.
max_cols_show_feature_transformer_mapping
max_cols_show_feature_transformer_mapping (Number) (Expert Setting)
Default value 1000
Up to how many input features to show transformers used for each input feature.
warning_unused_feature_show_max
warning_unused_feature_show_max (Number) (Expert Setting)
Default value 3
Up to how many input features to show, in preview, that are unused features.
interaction_finder_max_rows_x_cols
interaction_finder_max_rows_x_cols (Float) (Expert Setting)
Default value 200000.0
interaction_finder_corr_threshold
interaction_finder_corr_threshold (Float) (Expert Setting)
Default value 0.95
min_bootstrap_samples
Minimum number of bootstrap samples (Number) (Expert Setting)
Default value 1
Minimum number of bootstrap samples to use for estimating score and its standard deviation Actual number of bootstrap samples will vary between the min and max, depending upon row count (more rows, fewer samples) and accuracy settings (higher accuracy, more samples)
max_bootstrap_samples
Maximum number of bootstrap samples (Number) (Expert Setting)
Default value 100
Maximum number of bootstrap samples to use for estimating score and its standard deviation Actual number of bootstrap samples will vary between the min and max, depending upon row count (more rows, fewer samples) and accuracy settings (higher accuracy, more samples)
min_bootstrap_sample_size_factor
Minimum fraction of rows to use for bootstrap samples (Float) (Expert Setting)
Default value 1.0
Minimum fraction of row size to take as sample size for bootstrap estimator Actual sample size used for bootstrap estimate will vary between the min and max, depending upon row count (more rows, smaller sample size) and accuracy settings (higher accuracy, larger sample size)
max_bootstrap_sample_size_factor
Maximum fraction of rows to use for bootstrap samples (Float) (Expert Setting)
Default value 10.0
Maximum fraction of row size to take as sample size for bootstrap estimator Actual sample size used for bootstrap estimate will vary between the min and max, depending upon row count (more rows, smaller sample size) and accuracy settings (higher accuracy, larger sample size)
bootstrap_final_seed
Seed to use for final model bootstrap sampling (Number) (Expert Setting)
Default value -1
Seed to use for final model bootstrap sampling, -1 means use experiment-derived seed. E.g. one can retrain final model with different seed to get different final model error bars for scores.
benford_mad_threshold_int
benford_mad_threshold_int (Float) (Expert Setting)
Default value 0.03
Benford’s law: mean absolute deviance threshold equal and above which integer valued columns are treated as categoricals too
benford_mad_threshold_real
benford_mad_threshold_real (Float) (Expert Setting)
Default value 0.1
Benford’s law: mean absolute deviance threshold equal and above which real valued columns are treated as categoricals too
stabilize_features
Use tuning-evolution search result for final model transformer. (Boolean) (Expert Setting)
Default value True
- Whether final pipeline uses fixed features for some transformers that would normally
perform search, such as InteractionsTransformer. Use what learned from tuning and evolution (True) or to freshly search for new features (False). This can give a more stable pipeline, especially for small data or when using interaction transformer as pretransformer in multi-layer pipeline.
fraction_std_bootstrap_ladder_factor
Factor of standard deviation of bootstrap scores by which to accept new model in genetic algorithm. Too small a fraction will lead to accepting new models easily even if no significant improvement in score, while too large a fraction will reject too many good models. Non-zero value is a bit risky when no folds are used in GA, because bootstrap score is only rough estimate of error. (Float) (Expert Setting)
Default value 0.01
bootstrap_ladder_samples_limit
Minimum number of bootstrap samples that are required to limit accepting new model. If less than this, then new model is always accepted. (Number) (Expert Setting)
Default value 10
rdelta_percent_score_penalty_per_feature_by_interpretability
rdelta_percent_score_penalty_per_feature_by_interpretability (String) (Expert Setting)
Default value '{1: 0.0, 2: 0.1, 3: 1.0, 4: 2.0, 5: 5.0, 6: 10.0, 7: 20.0, 8: 30.0, 9: 50.0, 10: 100.0, 11: 100.0, 12: 100.0, 13: 100.0}'
drop_low_meta_weights
drop_low_meta_weights (Boolean) (Expert Setting)
Default value True
meta_weight_allowed_by_interpretability
meta_weight_allowed_by_interpretability (String) (Expert Setting)
Default value '{1: 1E-7, 2: 1E-5, 3: 1E-4, 4: 1E-3, 5: 1E-2, 6: 0.03, 7: 0.05, 8: 0.08, 9: 0.10, 10: 0.15, 11: 0.15, 12: 0.15, 13: 0.15}'
fs_data_vary_for_interpretability
fs_data_vary_for_interpretability (Number) (Expert Setting)
Default value 7
fs_data_frac
Fraction of data to use for another data slice for FS (Float) (Expert Setting)
Default value 0.5
round_up_indivs_for_busy_gpus
Whether to round-up individuals to ensure all GPUs used. Not always best if (say) have 16 GPUs, better to have multiple experiments if in multi-user environment on single node. (Boolean) (Expert Setting)
Default value True
require_graphviz
Whether to require Graphviz package at startup (Boolean) (Expert Setting)
Default value True
Graphviz is an optional requirement for native installations (RPM/DEP/Tar-SH, outside of Docker)to convert .dot files into .png files for pipeline visualizations as part of experiment artifacts
fast_approx_max_num_trees_ever
fast_approx_max_num_trees_ever (Number) (Expert Setting)
Default value -1
Max. number of trees to use for all tree model predictions. For testing, when predictions don’t matter. -1 means disabled.
max_absolute_feature_expansion
max_absolute_feature_expansion (Number) (Expert Setting)
Default value 1000
model_class_name_for_shift
model_class_name_for_shift (String) (Expert Setting)
Default value 'auto'
model_class_name_for_leakage
model_class_name_for_leakage (String) (Expert Setting)
Default value 'auto'
tensorflow_num_classes_switch_but_keep_lightgbm
tensorflow_num_classes_switch_but_keep_lightgbm (Number) (Expert Setting)
Default value 15
textlin_num_classes_switch
Class count above which do not use TextLin Transformer (Number) (Expert Setting)
Default value 5
Class count above which do not use TextLin Transformer.
text_gene_dim_reduction_choices
text_gene_dim_reduction_choices (List) (Expert Setting)
Default value [50]
text_gene_max_ngram
text_gene_max_ngram (List) (Expert Setting)
Default value [1, 2, 3]
number_of_texts_to_cache_in_bert_transformer
number_of_texts_to_cache_in_bert_transformer (Number) (Expert Setting)
Default value -1
Enables caching of BERT embeddings by temporally saving the embedding vectors to the experiment directory. Set to -1 to cache all text, set to 0 to disable caching.
gbm_early_stopping_rounds_min
gbm_early_stopping_rounds_min (Number) (Expert Setting)
Default value 1
gbm_early_stopping_rounds_max
gbm_early_stopping_rounds_max (Number) (Expert Setting)
Default value 10000000000
max_num_varimp_to_log
max_num_varimp_to_log (Number) (Expert Setting)
Default value 10
Max. number of top variable importances to show in logs during feature evolution
max_num_varimp_shift_to_log
max_num_varimp_shift_to_log (Number) (Expert Setting)
Default value 10
Max. number of top variable importance shifts to show in logs and GUI after final model built
can_skip_final_upper_layer_failures
can_skip_final_upper_layer_failures (Boolean) (Expert Setting)
Default value True
Whether can skip final model transformer failures for layer > first layer for multi-layer pipeline.
dump_modelparams_every_scored_indiv_feature_count
dump_modelparams_every_scored_indiv_feature_count (Number) (Expert Setting)
Default value 3
Number of features to show in model dump every scored individual
dump_modelparams_every_scored_indiv_mutation_count
dump_modelparams_every_scored_indiv_mutation_count (Number) (Expert Setting)
Default value 3
Number of past mutations to show in model dump every scored individual
dump_modelparams_separate_files
dump_modelparams_separate_files (Boolean) (Expert Setting)
Default value False
Whether to append (false) or have separate files, files like: individual_scored_id%d.iter%d*params*, (true) for modelparams every scored indiv
oauth2_client_tokens_enabled
oauth2_client_tokens_enabled (Boolean) (Expert Setting)
Default value False
Enables the option to initiate a PKCE flow from the UI in order to obtaintokens usable with Driverless clients
pdp_max_threads
Maximum number of threads/forks for autoreport PDP. -1 means auto. (Number) (Expert Setting)
Default value -1
autoviz_max_num_columns
Maximum number of column for Autoviz (Number) (Expert Setting)
Default value 50
- Maximum number of columns autoviz will work with.
If dataset has more columns than this number, autoviz will pick columns randomly, prioritizing numerical columns
autoviz_max_aggregated_rows
Maximum number of rows in aggregated frame (Number) (Expert Setting)
Default value 500
enable_custom_recipes_from_url
enable_custom_recipes_from_url (Boolean) (Expert Setting)
Default value True
Enable downloading of custom recipes from external URL.
enable_custom_recipes_from_zip
enable_custom_recipes_from_zip (Boolean) (Expert Setting)
Default value True
Enable upload recipe files to be zip, containing custom recipe(s) in root folder, while any other code or auxiliary files must be in some sub-folder.
enable_recreate_custom_recipes_env
enable_recreate_custom_recipes_env (Boolean) (Expert Setting)
Default value True
When set to true, it enable downloading custom recipes third party packages from the web, otherwise the python environment will be transferred from main worker.
include_custom_recipes_by_default
include_custom_recipes_by_default (Boolean) (Expert Setting)
Default value False
Include custom recipes in default inclusion lists (warning: enables all custom recipes)
h2o_recipes_url
h2o_recipes_url (String) (Expert Setting)
Default value 'None'
URL of H2O instance for use by transformers, models, or scorers.
h2o_recipes_ip
h2o_recipes_ip (String) (Expert Setting)
Default value 'None'
IP of H2O instance for use by transformers, models, or scorers.
h2o_recipes_nthreads
h2o_recipes_nthreads (Number) (Expert Setting)
Default value 8
Number of threads for H2O instance for use by transformers, models, or scorers. -1 for all.
h2o_recipes_log_level
h2o_recipes_log_level (String) (Expert Setting)
Default value 'None'
Log Level of H2O instance for use by transformers, models, or scorers.
h2o_recipes_max_mem_size
h2o_recipes_max_mem_size (String) (Expert Setting)
Default value 'None'
Maximum memory size of H2O instance for use by transformers, models, or scorers.
h2o_recipes_min_mem_size
h2o_recipes_min_mem_size (String) (Expert Setting)
Default value 'None'
Minimum memory size of H2O instance for use by transformers, models, or scorers.
h2o_recipes_kwargs
h2o_recipes_kwargs (Dict) (Expert Setting)
Default value {}
General user overrides of kwargs dict to pass to h2o.init() for recipe server.
h2o_recipes_start_trials
h2o_recipes_start_trials (Number) (Expert Setting)
Default value 5
Number of trials to give h2o-3 recipe server to start.
h2o_recipes_start_sleep0
h2o_recipes_start_sleep0 (Number) (Expert Setting)
Default value 1
Number of seconds to sleep before starting h2o-3 recipe server.
h2o_recipes_start_sleep
h2o_recipes_start_sleep (Number) (Expert Setting)
Default value 5
Number of seconds to sleep between trials of starting h2o-3 recipe server.
custom_recipes_lock_to_git_repo
custom_recipes_lock_to_git_repo (Boolean) (Expert Setting)
Default value False
- Lock source for recipes to a specific github repo.
If True then all custom recipes must come from the repo specified in setting: custom_recipes_git_repo
custom_recipes_git_repo
custom_recipes_git_repo (String) (Expert Setting)
Default value 'https://github.com/h2oai/driverlessai-recipes'
If custom_recipes_lock_to_git_repo is set to True, only this repo can be used to pull recipes from
custom_recipes_git_branch
custom_recipes_git_branch (String) (Expert Setting)
Default value 'None'
Branch constraint for recipe source repo. Any branch allowed if unset or None
custom_recipes_excluded_filenames_from_repo_download
basenames of files to exclude from repo download (List) (Expert Setting)
Default value []
allow_old_recipes_use_datadir_as_data_directory
Allow use of deprecated get_global_directory() method from custom recipes for backward compatibility of recipes created before 1.9.0. Disable to force separation of custom recipes per user (in which case user_dir() should be used instead). (Boolean) (Expert Setting)
Default value True
enable_custom_transformers
enable_custom_transformers (Boolean) (Expert Setting)
Default value True
enable_custom_pretransformers
enable_custom_pretransformers (Boolean) (Expert Setting)
Default value True
enable_custom_models
enable_custom_models (Boolean) (Expert Setting)
Default value True
enable_custom_scorers
enable_custom_scorers (Boolean) (Expert Setting)
Default value True
enable_custom_datas
enable_custom_datas (Boolean) (Expert Setting)
Default value True
enable_custom_explainers
enable_custom_explainers (Boolean) (Expert Setting)
Default value True
enable_custom_individuals
enable_custom_individuals (Boolean) (Expert Setting)
Default value True
enable_connectors_recipes
enable_connectors_recipes (Boolean) (Expert Setting)
Default value True
contrib_relative_directory
Base directory for recipes within data directory. (String) (Expert Setting)
Default value 'contrib'
contrib_env_relative_directory
contrib_env_relative_directory (String) (Expert Setting)
Default value 'contrib/env'
location of custom recipes packages installed (relative to data_directory) We will try to install packages dynamically, but can also do (before or after server started): (inside docker running docker instance if running docker, or as user server is running as (e.g. dai user) if deb/tar native installation: PYTHONPATH=<full tmp dir>/<contrib_env_relative_directory>/lib/python3.6/site-packages/ <path to dai>dai-env.sh python -m pip install –prefix=<full tmp dir>/<contrib_env_relative_directory> <packagename> –upgrade –upgrade-strategy only-if-needed –log-file pip_log_file.log where <path to dai> is /opt/h2oai/dai/ for native rpm/deb installation Note can also install wheel files if <packagename> is name of wheel file or archive.
pip_install_overall_retries
pip_install_overall_retries (Number) (Expert Setting)
Default value 2
pip install retry for call to pip. Sometimes need to try twice
pip_install_verbosity
pip_install_verbosity (Number) (Expert Setting)
Default value 2
pip install verbosity level (number of -v’s given to pip, up to 3
pip_install_timeout
pip_install_timeout (Number) (Expert Setting)
Default value 15
pip install timeout in seconds, Sometimes internet issues would mean want to fail faster
pip_install_retries
pip_install_retries (Number) (Expert Setting)
Default value 5
pip install retry count
pip_install_use_constraint
pip_install_use_constraint (Boolean) (Expert Setting)
Default value True
Whether to use DAI constraint file to help pip handle versions. pip can make mistakes and try to install updated packages for no reason.
pip_install_options
pip_install_options (List) (Expert Setting)
Default value []
pip install options: string of list of other options, e.g. [‘–proxy’, ‘http://user:password@proxyserver:port’]
enable_basic_acceptance_tests
enable_basic_acceptance_tests (Boolean) (Expert Setting)
Default value True
Whether to enable basic acceptance testing. Tests if can pickle the state, etc.
enable_acceptance_tests
enable_acceptance_tests (Boolean) (Expert Setting)
Default value True
Whether acceptance tests should run for custom genes / models / scorers / etc.
skip_disabled_recipes
skip_disabled_recipes (Boolean) (Expert Setting)
Default value False
Whether to skip disabled recipes (True) or fail and show GUI message (False).
contrib_reload_and_recheck_server_start
contrib_reload_and_recheck_server_start (Boolean) (Expert Setting)
Default value True
Whether to re-check recipes during server startup (if per_user_directories == false)
or during user login (if per_user_directories == true). If any inconsistency develops, the bad recipe will be removed during re-doing acceptance testing. This process can make start-up take alot longer for many recipes, but in LTS releases the risk of recipes becoming out of date is low. If set to false, will disable acceptance re-testing during sever start but note that previews or experiments may fail if those inconsistent recipes are used. Such inconsistencies can occur when API changes for recipes or more aggressive acceptance tests are performed.
contrib_install_packages_server_start
contrib_install_packages_server_start (Boolean) (Expert Setting)
Default value True
Whether to at least install packages required for recipes during server startup (if per_user_directories == false)
or during user login (if per_user_directories == true). Important to keep True so any later use of recipes (that have global packages installed) will work.
contrib_reload_and_recheck_worker_tasks
contrib_reload_and_recheck_worker_tasks (Boolean) (Expert Setting)
Default value False
- Whether to re-check recipes after uploaded from main server to worker in multinode.
Expensive for every task that has recipes to do this.
num_rows_acceptance_test_custom_transformer
num_rows_acceptance_test_custom_transformer (Number) (Expert Setting)
Default value 200
num_rows_acceptance_test_custom_model
num_rows_acceptance_test_custom_model (Number) (Expert Setting)
Default value 100
enable_mapr_multi_user_mode
enable_mapr_multi_user_mode (Boolean) (Expert Setting)
Default value False
Enables the multi-user mode for MapR integration, which allows to have MapR ticket per user.
minio_secret_access_key
Minio Secret Access Key (Any)
Default value ''
Minio Connector credentials
h2o_mli_nthreads
h2o_mli_nthreads (Number) (Expert Setting)
Default value 8
Number of threads for H2O instance for use by MLI.
mli_pd_numcat_num_chart
Unique feature values count driven Partial Dependence Plot binning and chart selection. (Boolean) (Expert Setting)
Default value True
Use dynamic switching between Partial Dependence Plot numeric and categorical binning and UI chart selection in case of features which were used both as numeric and categorical by experiment.
mli_pd_numcat_threshold
Threshold for Partial Dependence Plot binning and chart selection (<=threshold categorical, >threshold numeric). (Number) (Expert Setting)
Default value 11
If ‘mli_pd_numcat_num_chart’ is enabled, then use numeric binning and chart if feature unique values count is bigger than threshold, else use categorical binning and chart.
mli_run_kernel_explainer
Use Kernel Explainer to obtain Shapley values for original features (Boolean) (Expert Setting)
Default value False
Use Kernel Explainer to obtain Shapley values for original features.
mli_kernel_explainer_sample
Sample input dataset for Kernel Explainer (Boolean) (Expert Setting)
Default value True
Sample input dataset for Kernel Explainer.
mli_kernel_explainer_sample_size
Sample size for input dataset passed to Kernel Explainer (Number) (Expert Setting)
Default value 1000
Sample size for input dataset passed to Kernel Explainer.
mli_kernel_explainer_nsamples
Number of times to re-evaluate the model when explaining each prediction with Kernel Explainer. Default is determined internally (String) (Expert Setting)
Default value 'auto'
‘auto’ or int. Number of times to re-evaluate the model when explaining each prediction. More samples lead to lower variance estimates of the SHAP values. The ‘auto’ setting uses nsamples = 2 * X.shape[1] + 2048. This setting is disabled by default and DAI determines the right number internally.
mli_kernel_explainer_l1_reg
L1 regularization for Kernel Explainer (String) (Expert Setting)
Default value 'aic'
‘num_features(int)’, ‘auto’ (default for now, but deprecated), ‘aic’, ‘bic’, or float. The l1 regularization to use for feature selection (the estimation procedure is based on a debiased lasso). The ‘auto’ option currently uses aic when less that 20% of the possible sample space is enumerated, otherwise it uses no regularization. THE BEHAVIOR OF ‘auto’ WILL CHANGE in a future version to be based on ‘num_features’ instead of AIC. The aic and bic options use the AIC and BIC rules for regularization. Using ‘num_features(int)’ selects a fix number of top features. Passing a float directly sets the alpha parameter of the sklearn.linear_model.Lasso model used for feature selection.
mli_kernel_explainer_max_runtime
Max runtime for Kernel Explainer in seconds (Number) (Expert Setting)
Default value 900
Max runtime for Kernel Explainer in seconds. Default is 900, which equates to 15 minutes. Setting this parameter to -1 means to honor the Kernel Shapley sample size provided regardless of max runtime.
dask_cuda_cluster_kwargs
Set dask CUDA/RAPIDS cluster settings for single node workers. (Dict) (Expert Setting)
Default value {'scheduler_port': 0, 'dashboard_address': ':0', 'protocol': 'tcp'}
Set dask CUDA/RAPIDS cluster settings for single node workers. Additional environment variables can be set, see: https://dask-cuda.readthedocs.io/en/latest/ucx.html#dask-scheduler e.g. for ucx use: {} dict version of: dict(n_workers=None, threads_per_worker=1, processes=True, memory_limit=’auto’, device_memory_limit=None, CUDA_VISIBLE_DEVICES=None, data=None, local_directory=None, protocol=’ucx’, enable_tcp_over_ucx=True, enable_infiniband=False, enable_nvlink=False, enable_rdmacm=False, ucx_net_devices=’auto’, rmm_pool_size=’1GB’) WARNING: Do not add arguments like {‘n_workers’: 1, ‘processes’: True, ‘threads_per_worker’: 1} this will lead to hangs, cuda cluster handles this itself.
dask_cluster_kwargs
Set dask cluster settings for single node workers. (Dict) (Expert Setting)
Default value {'n_workers': 1, 'processes': True, 'threads_per_worker': 1, 'scheduler_port': 0, 'dashboard_address': ':0', 'protocol': 'tcp'}
Set dask cluster settings for single node workers.
dask_scheduler_env
Set dask scheduler env. (Dict) (Expert Setting)
Default value {}
Set dask scheduler env. See https://docs.dask.org/en/latest/setup/cli.html
dask_worker_env
Set dask worker environment variables. NCCL_SOCKET_IFNAME is automatically set, but can be overridden here. (Dict) (Expert Setting)
Default value {'NCCL_P2P_DISABLE': '1', 'NCCL_DEBUG': 'WARN'}
Set dask worker env. See https://docs.dask.org/en/latest/setup/cli.html
dask_cuda_worker_env
Set dask cuda worker environment variables. (Dict) (Expert Setting)
Default value {}
Set dask cuda worker env. See: https://dask-cuda.readthedocs.io/en/latest/ucx.html#launching-scheduler-workers-and-clients-separately
enable_imputation
Enabling imputation adds new picker to EXPT setup GUI and triggers imputation functionality in Transformers ** (Boolean) (Expert Setting)
Default value
False
Enable column imputation
datatable_parse_max_memory_bytes
datatable_parse_max_memory_bytes (Number) (Expert Setting)
Default value -1
Memory limit in bytes for datatable to use during parsing of CSV files. -1 for unlimited. 0 for automatic. >0 for constraint.
datatable_separator
datatable_separator (String) (Expert Setting)
Default value ''
Delimiter/Separator to use when parsing tabular text files like CSV. Automatic if empty. Must be provided at system start.
ping_load_data_file
Whether to enable ping of system status during DAI data ingestion. (Boolean) (Expert Setting)
Default value False
Whether to enable ping of system status during DAI data ingestion.
high_correlation_value_to_report
Threshold for reporting high correlation (Float) (Expert Setting)
Default value 0.95
Value to report high correlation between original features
datatable_bom_csv
datatable_bom_csv (Boolean) (Expert Setting)
Default value False
Include byte order mark (BOM) when writing CSV files. Required to support UTF-8 encoding in Excel.
check_invalid_config_toml_keys
check_invalid_config_toml_keys (Boolean) (Expert Setting)
Default value True
Whether to check if config.toml keys are valid and fail if not valid
predict_safe_trials
predict_safe_trials (Number) (Expert Setting)
Default value 2
fit_safe_trials
fit_safe_trials (Number) (Expert Setting)
Default value 2
allow_no_pid_host
Whether to allow no –pid=host setting. Some GPU info from within docker will not be correct. (Boolean) (Expert Setting)
Default value True
terminate_experiment_if_memory_low
terminate_experiment_if_memory_low (Boolean) (Expert Setting)
Default value False
Whether to terminate experiments if the system memory available falls below memory_limit_gb_terminate
memory_limit_gb_terminate
memory_limit_gb_terminate (Number) (Expert Setting)
Default value 5
Memory in GB beyond which will terminate experiment if terminate_experiment_if_memory_low=true.
last_exclusive_mode
last_exclusive_mode (String) (Expert Setting)
Default value ''
Internal helper to allow memory of if changed exclusive mode
max_time_series_properties_sample_size
max_time_series_properties_sample_size (Number) (Expert Setting)
Default value 250000
Max. sample size for automatic determination of time series train/valid split properties, only if time column is selected
max_lag_sizes
max_lag_sizes (Number) (Expert Setting)
Default value 30
Maximum number of lag sizes to use for lags-based time-series experiments. are sampled from if sample_lag_sizes==True, else all are taken (-1 == automatic)
min_lag_autocorrelation
min_lag_autocorrelation (Float) (Expert Setting)
Default value 0.1
Minimum required autocorrelation threshold for a lag to be considered for feature engineering
max_signal_lag_sizes
max_signal_lag_sizes (Number) (Expert Setting)
Default value 100
How many samples of lag sizes to use for a single time group (single time series signal)
single_model_vs_cv_score_reldiff
single_model_vs_cv_score_reldiff (Float) (Expert Setting)
Default value 0.05
single_model_vs_cv_score_reldiff2
single_model_vs_cv_score_reldiff2 (Float) (Expert Setting)
Default value 0.0