Expert configuration¶
time_abort_format
¶
Time string format for time_abort. (String) (Expert Setting)
Default value '%Y-%m-%d %H:%M:%S'
Any format is allowed as accepted by datetime.strptime.
time_abort_timezone
¶
Time zone for time_abort. (String) (Expert Setting)
Default value 'UTC'
Any time zone in format accepted by datetime.strptime.
inject_mojo_for_predictions
¶
inject_mojo_for_predictions (Boolean) (Expert Setting)
Default value True
Inject MOJO into fitted Python state if mini acceptance test passes, so can use C++ MOJO runtime when calling predict(enable_mojo=True, IS_SCORER=True, …). Prerequisite for mojo_for_predictions=〉on〉 or 〈auto〉.
mojo_acceptance_test_rtol
¶
Relative tolerance for mini MOJO acceptance test. (Float) (Expert Setting)
Default value 0.0
Relative tolerance for mini MOJO acceptance test. If Python/C++ MOJO differs more than this from Python, won’t use MOJO inside Python for later scoring. Only applicable if mojo_for_predictions=True. Disabled if <= 0.
mojo_acceptance_test_atol
¶
Absolute tolerance for mini MOJO acceptance test. (Float) (Expert Setting)
Default value 0.0
Absolute tolerance for mini MOJO acceptance test (for regression/Shapley, will be scaled by max(abs(preds)). If Python/C++ MOJO differs more than this from Python, won’t use MOJO inside Python for later scoring. Only applicable if mojo_for_predictions=True. Disabled if <= 0.
max_cols_make_autoreport_automatically
¶
Number of columns beyond which will not automatically build autoreport at end of experiment. (Number) (Expert Setting)
Default value 1000
max_cols_make_pipeline_visualization_automatically
¶
Number of columns beyond which will not automatically build pipeline visualization at end of experiment. (Number) (Expert Setting)
Default value 5000
transformer_description_line_length
¶
Line length for autoreport descriptions of transformers. -1 means use autodoc_keras_summary_line_length (Number) (Expert Setting)
Default value -1
benchmark_mojo_latency_auto_size_limit
¶
Max size of pipeline.mojo file (in MB) for when benchmark_mojo_latency is set to 〈auto〉 (Number) (Expert Setting)
Default value 2048
Max size of pipeline.mojo file (in MB) for automatic mode of MOJO scoring latency measurement
max_dt_threads_do_timeseries_split_suggestion
¶
max_dt_threads_do_timeseries_split_suggestion (Number) (Expert Setting)
Default value 1
Maximum number of threads for datatable during TS properties preview panel computations).
kaggle_keep_submission
¶
Whether to keep Kaggle submission file in experiment directory (Boolean) (Expert Setting)
Default value False
kaggle_competitions
¶
Custom Kaggle competitions to make automatic test set submissions for. (String) (Expert Setting)
Default value ''
If provided, can extend the list to arbitrary and potentially future Kaggle competitions to make submissions for. Only used if kaggle_key and kaggle_username are provided. Provide a quoted comma-separated list of tuples (target column name, number of test rows, competition, metric) like this: kaggle_competitions=〉(《target》, 200000, 《santander-customer-transaction-prediction》, 《AUC》), (《TARGET》, 75818, 《santander-customer-satisfaction》, 《AUC》)〉
ping_period
¶
ping_period (Number) (Expert Setting)
Default value 60
Period (in seconds) of ping by Driverless AI server to each experiment (in order to get logger info like disk space and memory usage). 0 means don’t print anything.
ping_autodl
¶
Whether to enable ping of system status during DAI experiments. (Boolean) (Expert Setting)
Default value True
Whether to enable ping of system status during DAI experiments.
stall_disk_limit_gb
¶
stall_disk_limit_gb (Number) (Expert Setting)
Default value 1
Minimum amount of disk space in GB needed to before stall forking of new processes during an experiment.
min_rows_per_class
¶
min_rows_per_class (Number) (Expert Setting)
Default value 5
Minimum required number of rows (in the training data) for each class label for classification problems.
min_rows_per_split
¶
min_rows_per_split (Number) (Expert Setting)
Default value 5
Minimum required number of rows for each split when generating validation samples.
tf_nan_impute_value
¶
tf_nan_impute_value (Number) (Expert Setting)
Default value -5
For tensorflow, what numerical value to give to missing values, where numeric values are standardized. So 0 is center of distribution, and if Normal distribution then +-5 is 5 standard deviations away from the center. In many cases, an out of bounds value is a good way to represent missings, but in some cases the mean (0) may be better.
statistical_threshold_data_size_small
¶
statistical_threshold_data_size_small (Number) (Expert Setting)
Default value 100000
Internal threshold for number of rows x number of columns to trigger certain statistical techniques (small data recipe like including one hot encoding for all model types, and smaller learning rate) to increase model accuracy
statistical_threshold_data_size_large
¶
statistical_threshold_data_size_large (Number) (Expert Setting)
Default value 500000000
Internal threshold for number of rows x number of columns to trigger certain statistical techniques (fewer genes created, removal of high max_depth for tree models, etc.) that can speed up modeling. Also controls maximum rows used in training final model, by sampling statistical_threshold_data_size_large / columns number of rows
aux_threshold_data_size_large
¶
aux_threshold_data_size_large (Number) (Expert Setting)
Default value 10000000
Internal threshold for number of rows x number of columns to trigger sampling for auxiliary data uses, like imbalanced data set detection and bootstrap scoring sample size and iterations
set_method_sampling_row_limit
¶
set_method_sampling_row_limit (Number) (Expert Setting)
Default value 5000000
Internal threshold for set-based method for sampling without replacement. Can be 10x faster than np_random_choice internal optimized method, and up to 30x faster than np.random.choice to sample 250k rows from 1B rows etc.
performance_threshold_data_size_small
¶
performance_threshold_data_size_small (Number) (Expert Setting)
Default value 100000
Internal threshold for number of rows x number of columns to trigger certain changes in performance (fewer threads if beyond large value) to help avoid OOM or unnecessary slowdowns (fewer threads if lower than small value) to avoid excess forking of tasks
performance_threshold_data_size_large
¶
performance_threshold_data_size_large (Number) (Expert Setting)
Default value 100000000
Internal threshold for number of rows x number of columns to trigger certain changes in performance (fewer threads if beyond large value) to help avoid OOM or unnecessary slowdowns (fewer threads if lower than small value) to avoid excess forking of tasks
gpu_default_threshold_data_size_large
¶
gpu_default_threshold_data_size_large (Number) (Expert Setting)
Default value 1000000
Threshold for number of rows x number of columns to trigger GPU to be default for models like XGBoost GBM.
max_relative_cols_mismatch_allowed
¶
max_relative_cols_mismatch_allowed (Float) (Expert Setting)
Default value 0.5
Maximum fraction of mismatched columns to allow between train and either valid or test. Beyond this value the experiment will fail with invalid data error.
max_rows_final_blender
¶
max_rows_final_blender (Number) (Expert Setting)
Default value 1000000
Largest number of rows to use for final ensemble blender for regression and binary (scaled down linearly by number of classes for multiclass for >= 10 classes), otherwise sample randomly.
min_rows_final_blender
¶
min_rows_final_blender (Number) (Expert Setting)
Default value 10000
Smallest number of rows (or number of rows if less than this) to use for final ensemble blender.
max_rows_final_train_score
¶
max_rows_final_train_score (Number) (Expert Setting)
Default value 5000000
Largest number of rows to use for final training score (no holdout), otherwise sample randomly
max_rows_final_roccmconf
¶
max_rows_final_roccmconf (Number) (Expert Setting)
Default value 1000000
Largest number of rows to use for final ROC, lift-gains, confusion matrix, residual, and actual vs. predicted. Otherwise sample randomly
max_rows_final_holdout_score
¶
max_rows_final_holdout_score (Number) (Expert Setting)
Default value 5000000
Largest number of rows to use for final holdout scores, otherwise sample randomly
max_rows_final_holdout_bootstrap_score
¶
max_rows_final_holdout_bootstrap_score (Number) (Expert Setting)
Default value 1000000
Largest number of rows to use for final holdout bootstrap scores, otherwise sample randomly
max_rows_leak
¶
Max. rows for leakage detection if wide rules used on wide data (Number) (Expert Setting)
Default value 100000
max_workers_fs
¶
Num. simultaneous predictions for feature selection (0 = auto) (Number) (Expert Setting)
Default value 0
How many workers to use for feature selection by permutation for predict phase. (0 = auto, > 0: min of DAI value and this value, < 0: exactly negative of this value)
max_workers_shift_leak
¶
Num. simultaneous fits for shift and leak checks if using LightGBM on CPU (0 = auto) (Number) (Expert Setting)
Default value 0
How many workers to use for shift and leakage checks if using LightGBM on CPU. (0 = auto, > 0: min of DAI value and this value, < 0: exactly negative of this value)
max_orig_nonnumeric_cols_selected_default
¶
max_orig_nonnumeric_cols_selected_default (Number) (Expert Setting)
Default value 300
full_cv_accuracy_switch
¶
full_cv_accuracy_switch (Number) (Expert Setting)
Default value 9
Accuracy setting equal and above which enables full cross-validation (multiple folds) during feature evolution as opposed to only a single holdout split (e.g. 2/3 train and 1/3 validation holdout)
ensemble_accuracy_switch
¶
ensemble_accuracy_switch (Number) (Expert Setting)
Default value 5
Accuracy setting equal and above which enables stacked ensemble as final model. Stacking commences at the end of the feature evolution process.. It quite often leads to better model performance, but it does increase the complexity and execution time of the final model.
num_ensemble_folds
¶
num_ensemble_folds (Number) (Expert Setting)
Default value 4
Number of fold splits to use for ensemble_level >= 2. The ensemble modelling may require predictions to be made on out-of-fold samples hence the data needs to be split on different folds to generate these predictions. Less folds (like 2 or 3) normally create more stable models, but may be less accurate More folds can get to higher accuracy at the expense of more time, but the performance may be less stable when the training data is not enough (i.e. higher chance of overfitting). Actual value will vary for small or big data cases.
fold_reps
¶
fold_reps (Number) (Expert Setting)
Default value 1
Number of repeats for each fold for all validation (modified slightly for small or big data cases)
max_num_classes_hard_limit
¶
max_num_classes_hard_limit (Number) (Expert Setting)
Default value 10000
min_roc_sample_size
¶
min_roc_sample_size (Number) (Expert Setting)
Default value 1
enable_strict_confict_key_check_for_brain
¶
enable_strict_confict_key_check_for_brain (Boolean) (Expert Setting)
Default value True
allow_change_layer_count_brain
¶
For feature brain or restart/refit, whether to allow brain ingest to use different feature engineering layer count. (Boolean) (Expert Setting)
Default value False
brain_maximum_diff_score
¶
brain_maximum_diff_score (Float) (Expert Setting)
Default value 0.1
Relative number of columns that must match between current reference individual and brain individual. 0.0: perfect match 1.0: All columns are different, worst match e.g. 0.1 implies no more than 10% of columns mismatch between reference set of columns and brain individual.
brain_max_size_GB
¶
brain_max_size_GB (Number) (Expert Setting)
Default value 20
Maximum size in bytes the brain will store We reserve this memory to save data in order to ensure we can retrieve an experiment if for any reason it gets interrupted. -1: unlimited >=0 number of GB to limit brain to
early_stopping
¶
early_stopping (Boolean) (Expert Setting)
Default value True
Whether to enable early stopping Early stopping refers to stopping the feature evolution/engineering process when there is no performance uplift after a certain number of iterations. After early stopping has been triggered, Driverless AI will initiate the ensemble process if selected.
early_stopping_per_individual
¶
early_stopping_per_individual (Boolean) (Expert Setting)
Default value True
Whether to enable early stopping per individual Each individual in the generic algorithm will stop early if no improvement, and it will no longer be mutated. Instead, the best individual will be additionally mutated.
text_dominated_limit_tuning
¶
text_dominated_limit_tuning (Boolean) (Expert Setting)
Default value True
Whether to reduce options for text-dominated models to reduce expense, e.g. disable ensemble, disable genetic algorithm, single identity target encoder for classification, etc.
image_dominated_limit_tuning
¶
image_dominated_limit_tuning (Boolean) (Expert Setting)
Default value True
Whether to reduce options for image-dominated models to reduce expense, e.g. disable ensemble, disable genetic algorithm, single identity target encoder for classification, etc.
supported_image_types
¶
supported_image_types (List) (Expert Setting)
Default value ['jpg', 'jpeg', 'png', 'bmp', 'ppm', 'tif', 'tiff', 'JPG', 'JPEG', 'PNG', 'BMP', 'PPM', 'TIF', 'TIFF']
Supported image types. URIs with these endings will be considered as image paths (local or remote).
image_paths_absolute
¶
image_paths_absolute (Boolean) (Expert Setting)
Default value False
Whether to create absolute paths for images when importing datasets containing images. Can faciliate testing or re-use of frames for scoring.
text_dl_token_pad_percentile
¶
text_dl_token_pad_percentile (Number) (Expert Setting)
Default value 99
Percentile value cutoff of input text token lengths for nlp deep learning models
text_dl_token_pad_max
¶
text_dl_token_pad_max (Number) (Expert Setting)
Default value 512
Maximum token length of input text to be used in nlp deep learning models
tune_parameters_accuracy_switch
¶
tune_parameters_accuracy_switch (Number) (Expert Setting)
Default value 3
Accuracy setting equal and above which enables tuning of model parameters Only applicable if parameter_tuning_num_models=-1 (auto)
tune_target_transform_accuracy_switch
¶
tune_target_transform_accuracy_switch (Number) (Expert Setting)
Default value 5
Accuracy setting equal and above which enables tuning of target transform for regression. This is useful for time series when instead of predicting the actual target value, it might be better to predict a transformed target variable like sqrt(target) or log(target) as a means to control for outliers.
tournament_uniform_style_interpretability_switch
¶
tournament_uniform_style_interpretability_switch (Number) (Expert Setting)
Default value 8
Interpretability above which will use 〈uniform〉 tournament style
tournament_uniform_style_accuracy_switch
¶
tournament_uniform_style_accuracy_switch (Number) (Expert Setting)
Default value 6
Accuracy below which will use uniform style if tournament_style = 〈auto〉 (regardless of other accuracy tournament style switch values)
tournament_model_style_accuracy_switch
¶
tournament_model_style_accuracy_switch (Number) (Expert Setting)
Default value 6
Accuracy equal and above which uses model style if tournament_style = 〈auto〉
tournament_feature_style_accuracy_switch
¶
tournament_feature_style_accuracy_switch (Number) (Expert Setting)
Default value 13
Accuracy equal and above which uses feature style if tournament_style = 〈auto〉
tournament_fullstack_style_accuracy_switch
¶
tournament_fullstack_style_accuracy_switch (Number) (Expert Setting)
Default value 13
Accuracy equal and above which uses fullstack style if tournament_style = 〈auto〉
tournament_use_feature_penalized_score
¶
tournament_use_feature_penalized_score (Boolean) (Expert Setting)
Default value True
Whether to use penalized score for GA tournament or actual score
tournament_keep_poor_scores_for_small_data
¶
tournament_keep_poor_scores_for_small_data (Boolean) (Expert Setting)
Default value True
- Whether to keep poor scores for small data (<10k rows) in case exploration will find good model.
sets tournament_remove_poor_scores_before_evolution_model_factor=1.1 tournament_remove_worse_than_constant_before_evolution=false tournament_keep_absolute_ok_scores_before_evolution_model_factor=1.1 tournament_remove_poor_scores_before_final_model_factor=1.1 tournament_remove_worse_than_constant_before_final_model=true
tournament_remove_poor_scores_before_evolution_model_factor
¶
tournament_remove_poor_scores_before_evolution_model_factor (Float) (Expert Setting)
Default value 0.7
- Factor (compared to best score plus each score) beyond which to drop poorly scoring models before evolution.
This is useful in cases when poorly scoring models take a long time to train.
tournament_remove_worse_than_constant_before_evolution
¶
tournament_remove_worse_than_constant_before_evolution (Boolean) (Expert Setting)
Default value True
For before evolution after tuning, whether to remove models that are worse than (optimized to scorer) constant prediction model
tournament_keep_absolute_ok_scores_before_evolution_model_factor
¶
tournament_keep_absolute_ok_scores_before_evolution_model_factor (Float) (Expert Setting)
Default value 0.2
For before evolution after tuning, where on scale of 0 (perfect) to 1 (constant model) to keep ok scores by absolute value.
tournament_remove_poor_scores_before_final_model_factor
¶
tournament_remove_poor_scores_before_final_model_factor (Float) (Expert Setting)
Default value 0.3
Factor (compared to best score) beyond which to drop poorly scoring models before building final ensemble. This is useful in cases when poorly scoring models take a long time to train.
tournament_remove_worse_than_constant_before_final_model
¶
tournament_remove_worse_than_constant_before_final_model (Boolean) (Expert Setting)
Default value True
For before final model after evolution, whether to remove models that are worse than (optimized to scorer) constant prediction model
num_individuals
¶
num_individuals (Number) (Expert Setting)
Default value 2
Driverless AI uses a genetic algorithm (GA) to find the best features, best models and best hyper parameters for these models. The GA facilitates getting good results while not requiring torun/try every possible model/feature/parameter. This version of GA has reinforcement learning elements - it uses a form of exploration-exploitation to reach optimum solutions. This means it will capitalise on models/features/parameters that seem # to be working well and continue to exploit them even more, while allowing some room for trying new (and semi-random) models/features/parameters to avoid settling on a local minimum. These models/features/parameters tried are what-we-call individuals of a population. More # individuals connote more models/features/parameters to be tried and compete to find the best # ones.
cv_in_cv_overconfidence_protection_factor
¶
cv_in_cv_overconfidence_protection_factor (Float) (Expert Setting)
Default value 3.0
excluded_transformers
¶
Exclude specific transformers (List) (Expert Setting)
Default value []
- Auxiliary to included_transformers
e.g. to disable all Target Encoding: excluded_transformers = 〈[〈NumCatTETransformer〉, 〈CVTargetEncodeF〉, 〈NumToCatTETransformer〉, 〈ClusterTETransformer〉]〉. Does not affect transformers used for preprocessing with included_pretransformers.
excluded_genes
¶
Exclude specific genes (List) (Expert Setting)
Default value []
Exclude list of genes (i.e. genes (built on top of transformers) to not use, independent of the interpretability setting) Some transformers are used by multiple genes, so this allows different control over feature engineering
for multi-class: 〈[〈InteractionsGene〉, 〈WeightOfEvidenceGene〉, 〈NumToCatTargetEncodeSingleGene〉, 〈OriginalGene〉, 〈TextGene〉, 〈FrequentGene〉, 〈NumToCatWeightOfEvidenceGene〉, 〈NumToCatWeightOfEvidenceMonotonicGene〉, 〈 CvTargetEncodeSingleGene〉, 〈DateGene〉, 〈NumToCatTargetEncodeMultiGene〉, 〈 DateTimeGene〉, 〈TextLinRegressorGene〉, 〈ClusterIDTargetEncodeSingleGene〉, 〈CvCatNumEncodeGene〉, 〈TruncSvdNumGene〉, 〈ClusterIDTargetEncodeMultiGene〉, 〈NumCatTargetEncodeMultiGene〉, 〈CvTargetEncodeMultiGene〉, 〈TextLinClassifierGene〉, 〈NumCatTargetEncodeSingleGene〉, 〈ClusterDistGene〉]〉
for regression/binary: 〈[〈CvTargetEncodeSingleGene〉, 〈NumToCatTargetEncodeSingleGene〉, 〈CvCatNumEncodeGene〉, 〈ClusterIDTargetEncodeSingleGene〉, 〈TextLinRegressorGene〉, 〈CvTargetEncodeMultiGene〉, 〈ClusterDistGene〉, 〈OriginalGene〉, 〈DateGene〉, 〈ClusterIDTargetEncodeMultiGene〉, 〈NumToCatTargetEncodeMultiGene〉, 〈NumCatTargetEncodeMultiGene〉, 〈TextLinClassifierGene〉, 〈WeightOfEvidenceGene〉, 〈FrequentGene〉, 〈TruncSvdNumGene〉, 〈InteractionsGene〉, 〈TextGene〉, 〈DateTimeGene〉, 〈NumToCatWeightOfEvidenceGene〉, 〈NumToCatWeightOfEvidenceMonotonicGene〉, 〈〉NumCatTargetEncodeSingleGene〉]〉
This list appears in the experiment logs (search for 〈Genes used〉) e.g. to disable interaction gene, use: excluded_genes = 〈[〈InteractionsGene〉]〉. Does not affect transformers used for preprocessing with included_pretransformers.
excluded_models
¶
Exclude specific models (List) (Expert Setting)
Default value []
Auxiliary to included_models
excluded_pretransformers
¶
Exclude specific pretransformers (List) (Expert Setting)
Default value []
Auxiliary to included_pretransformers
excluded_datas
¶
Exclude specific data recipes (List) (Expert Setting)
Default value []
Auxiliary to included_datas
excluded_individuals
¶
Exclude specific individual recipes (List) (Expert Setting)
Default value []
Auxiliary to included_individuals
excluded_scorers
¶
Exclude specific scorers (List) (Expert Setting)
Default value []
Auxiliary to included_scorers
use_dask_for_1_gpu
¶
use_dask_for_1_gpu (Boolean) (Expert Setting)
Default value False
Whether to use dask_cudf even for 1 GPU. If False, will use plain cudf.
optuna_pruner_kwargs
¶
Set Optuna pruner constructor args. (Dict) (Expert Setting)
Default value {'n_startup_trials': 5, 'n_warmup_steps': 20, 'interval_steps': 20, 'percentile': 25.0, 'min_resource': 'auto', 'max_resource': 'auto', 'reduction_factor': 4, 'min_early_stopping_rate': 0, 'n_brackets': 4, 'min_early_stopping_rate_low': 0, 'upper': 1.0, 'lower': 0.0}
Set Optuna constructor arguments for particular applicable pruners. https://optuna.readthedocs.io/en/stable/reference/pruners.html
optuna_sampler_kwargs
¶
Set Optuna sampler constructor args. (Dict) (Expert Setting)
Default value {}
Set Optuna constructor arguments for particular applicable samplers. https://optuna.readthedocs.io/en/stable/reference/samplers.html
drop_constant_model_final_ensemble
¶
drop_constant_model_final_ensemble (Boolean) (Expert Setting)
Default value True
xgboost_rf_exact_threshold_num_rows_x_cols
¶
xgboost_rf_exact_threshold_num_rows_x_cols (Number) (Expert Setting)
Default value 10000
lossguide_drop_factor
¶
Factor by which to drop max_leaves from effective max_depth value when doing loss_guide. E.g. if max_depth is normally 12, this makes leaves 2**11 not 2**12 (Float) (Expert Setting)
Default value 4.0
lossguide_max_depth_extend_factor
¶
Factor by which to extend max_depth mutations when doing loss_guide. E.g. if max_leaves ends up as x let max_depth be factor * x. (Float) (Expert Setting)
Default value 8.0
params_tune_grow_policy_simple_trees
¶
params_tune_grow_policy_simple_trees (Boolean) (Expert Setting)
Default value True
Whether to force max_leaves and max_depth to be 0 if grow_policy is depthwise and lossguide, respectively.
max_epochs_tf_big_data
¶
max_epochs_tf_big_data (Number) (Expert Setting)
Default value 5
Number of epochs for TensorFlow when larger data size.
default_max_bin
¶
default_max_bin (Number) (Expert Setting)
Default value 256
Default max_bin for tree methods
default_lightgbm_max_bin
¶
default_lightgbm_max_bin (Number) (Expert Setting)
Default value 249
Default max_bin for LightGBM (64 recommended for GPU LightGBM for speed)
min_max_bin
¶
min_max_bin (Number) (Expert Setting)
Default value 32
Minimum max_bin for any tree
tensorflow_use_all_cores
¶
tensorflow_use_all_cores (Boolean) (Expert Setting)
Default value True
Whether TensorFlow will use all CPU cores, or if it will split among all transformers. Only for transformers, not TensorFlow model.
tensorflow_use_all_cores_even_if_reproducible_true
¶
tensorflow_use_all_cores_even_if_reproducible_true (Boolean) (Expert Setting)
Default value False
Whether TensorFlow will use all CPU cores if reproducible is set, or if it will split among all transformers
tensorflow_disable_memory_optimization
¶
tensorflow_disable_memory_optimization (Boolean) (Expert Setting)
Default value True
Whether to disable TensorFlow memory optimizations. Can help fix tensorflow.python.framework.errors_impl.AlreadyExistsError
tensorflow_cores
¶
tensorflow_cores (Number) (Expert Setting)
Default value 0
How many cores to use for each TensorFlow model, regardless if GPU or CPU based (0 = auto mode)
tensorflow_model_max_cores
¶
tensorflow_model_max_cores (Number) (Expert Setting)
Default value 4
For TensorFlow models, maximum number of cores to use if tensorflow_cores=0 (auto mode), because TensorFlow model is inefficient at using many cores. See also max_fit_cores for all models.
bert_cores
¶
bert_cores (Number) (Expert Setting)
Default value 0
How many cores to use for each Bert Model and Transformer, regardless if GPU or CPU based (0 = auto mode)
bert_use_all_cores
¶
bert_use_all_cores (Boolean) (Expert Setting)
Default value True
Whether Bert will use all CPU cores, or if it will split among all transformers. Only for transformers, not Bert model.
bert_model_max_cores
¶
bert_model_max_cores (Number) (Expert Setting)
Default value 8
For Bert models, maximum number of cores to use if bert_cores=0 (auto mode), because Bert model is inefficient at using many cores. See also max_fit_cores for all models.
one_hot_encoding_show_actual_levels_in_features
¶
Whether to show real levels in One Hot Encoding feature names. Leads to feature aggregation problems when switch between binning and not binning in fold splits. Feature description will still contain levels in each bin if True or False. (Boolean) (Expert Setting)
Default value False
validate_meta_learner
¶
Enable basic logging and notifications for ensemble meta learner (Boolean) (Expert Setting)
Default value True
validate_meta_learner_extra
¶
Enable extra logging for ensemble meta learner: ensemble must be at least as good as each base model (Boolean) (Expert Setting)
Default value False
num_fold_ids_show
¶
Maximum number of fold IDs to show in logs (Number) (Expert Setting)
Default value 10
fold_scores_instability_warning_threshold
¶
Declare positive fold scores as unstable if stddev / mean is larger than this value (Float) (Expert Setting)
Default value 0.25
imbalance_ratio_multiclass_threshold
¶
Ratio of most frequent to least frequent class for imbalanced multiclass classification problems equal and above which to trigger special handling due to class imbalance (Number) (Expert Setting)
Default value 5
Special handling can include special models, special scorers, special feature engineering.
heavy_imbalance_ratio_multiclass_threshold
¶
Ratio of most frequent to least frequent class for imbalanced multiclass classification problems equal and above which to trigger special handling due to heavy class imbalance (Number) (Expert Setting)
Default value 25
Special handling can include special models, special scorers, special feature engineering.
imbalance_sampling_rank_averaging
¶
Whether to do rank averaging bagged models inside of imbalanced models, instead of probability averaging (String) (Expert Setting)
Default value 'auto'
- Rank averaging can be helpful when ensembling diverse models when ranking metrics like AUC/Gini
metrics are optimized. No MOJO support yet.
imbalance_ratio_notification_threshold
¶
imbalance_ratio_notification_threshold (Float) (Expert Setting)
Default value 2.0
For binary classification: ratio of majority to minority class equal and above which to notify
of imbalance in GUI to say slightly imbalanced.
More than imbalance_ratio_sampling_threshold
will say problem is imbalanced.
nbins_ftrl_list
¶
nbins_ftrl_list (List) (Expert Setting)
Default value [1000000, 10000000, 100000000]
List of possible bins for FTRL (largest is default best value)
te_bin_list
¶
te_bin_list (List) (Expert Setting)
Default value [25, 10, 100, 250]
List of possible bins for target encoding (first is default value)
woe_bin_list
¶
woe_bin_list (List) (Expert Setting)
Default value [25, 10, 100, 250]
List of possible bins for weight of evidence encoding (first is default value) If only want one value: woe_bin_list = [2]
ohe_bin_list
¶
ohe_bin_list (List) (Expert Setting)
Default value [10, 25, 50, 75, 100]
List of possible bins for ohe hot encoding (first is default value). If left as default, the actual list is changed for given data size and dials.
binner_bin_list
¶
binner_bin_list (List) (Expert Setting)
Default value [5, 10, 20]
List of max possible number of bins for numeric binning (first is default value). If left as default, the actual list is changed for given data size and dials. The binner will automatically reduce the number of bins based on predictive power.
drop_duplicate_rows_timeout
¶
Timeout in seconds for dropping duplicate rows in training data, propportionally increases as rows*cols grows as compared to detect_duplicate_rows_max_rows_x_cols. (Number) (Expert Setting)
Default value 60
shift_check_text
¶
shift_check_text (Boolean) (Expert Setting)
Default value False
Whether to enable checking text for shift, currently only via label encoding.
use_rf_for_shift_if_have_lgbm
¶
use_rf_for_shift_if_have_lgbm (Boolean) (Expert Setting)
Default value True
Whether to use LightGBM random forest mode without early stopping for shift detection.
shift_key_features_varimp
¶
shift_key_features_varimp (Float) (Expert Setting)
Default value 0.01
Normalized training variable importance above which to check the feature for shift Useful to avoid checking likely unimportant features
shift_check_reduced_features
¶
shift_check_reduced_features (Boolean) (Expert Setting)
Default value True
Whether to only check certain features based upon the value of shift_key_features_varimp
shift_trees
¶
shift_trees (Number) (Expert Setting)
Default value 100
Number of trees to use to train model to check shift in distribution No larger than max_nestimators
shift_max_bin
¶
shift_max_bin (Number) (Expert Setting)
Default value 256
The value of max_bin to use for trees to use to train model to check shift in distribution
shift_min_max_depth
¶
shift_min_max_depth (Number) (Expert Setting)
Default value 4
The min. value of max_depth to use for trees to use to train model to check shift in distribution
shift_max_max_depth
¶
shift_max_max_depth (Number) (Expert Setting)
Default value 8
The max. value of max_depth to use for trees to use to train model to check shift in distribution
detect_features_distribution_shift_threshold_auc
¶
detect_features_distribution_shift_threshold_auc (Float) (Expert Setting)
Default value 0.55
If distribution shift detection is enabled, show features for which shift AUC is above this value (AUC of a binary classifier that predicts whether given feature value belongs to train or test data)
leakage_check_text
¶
leakage_check_text (Boolean) (Expert Setting)
Default value True
Whether to enable checking text for leakage, currently only via label encoding.
leakage_key_features_varimp
¶
leakage_key_features_varimp (Float) (Expert Setting)
Default value 0.001
Normalized training variable importance (per 1 minus AUC/R2 to control for leaky varimp dominance) above which to check the feature for leakage Useful to avoid checking likely unimportant features
leakage_check_reduced_features
¶
leakage_check_reduced_features (Boolean) (Expert Setting)
Default value True
Whether to only check certain features based upon the value of leakage_key_features_varimp. If any feature has AUC near 1, will consume all variable importance, even if another feature is also leaky. So False is safest option, but True generally good if many columns.
use_rf_for_leakage_if_have_lgbm
¶
use_rf_for_leakage_if_have_lgbm (Boolean) (Expert Setting)
Default value True
Whether to use LightGBM random forest mode without early stopping for leakage detection.
leakage_trees
¶
leakage_trees (Number) (Expert Setting)
Default value 100
Number of trees to use to train model to check for leakage No larger than max_nestimators
leakage_max_bin
¶
leakage_max_bin (Number) (Expert Setting)
Default value 256
The value of max_bin to use for trees to use to train model to check for leakage
leakage_min_max_depth
¶
leakage_min_max_depth (Number) (Expert Setting)
Default value 6
The value of max_depth to use for trees to use to train model to check for leakage
leakage_max_max_depth
¶
leakage_max_max_depth (Number) (Expert Setting)
Default value 8
The value of max_depth to use for trees to use to train model to check for leakage
leakage_train_test_split
¶
leakage_train_test_split (Float) (Expert Setting)
Default value 0.25
Ratio of train to validation holdout when testing for leakage
check_system_basic
¶
Whether to report basic system information on server startup (Boolean) (Expert Setting)
Default value True
abs_tol_for_perfect_score
¶
abs_tol_for_perfect_score (Float) (Expert Setting)
Default value 0.0001
How close to the optimal value (usually 1 or 0) does the validation score need to be to be considered perfect (to stop the experiment)?
data_ingest_timeout
¶
data_ingest_timeout (Float) (Expert Setting)
Default value 86400.0
Timeout in seconds to wait for data ingestion.
debug_daimodel_level
¶
debug_daimodel_level (Number) (Expert Setting)
Default value 0
log_predict_info
¶
Whether to show detailed predict information in logs. (Boolean) (Expert Setting)
Default value True
log_fit_info
¶
Whether to show detailed fit information in logs. (Boolean) (Expert Setting)
Default value True
show_inapplicable_models_preview
¶
show_inapplicable_models_preview (Boolean) (Expert Setting)
Default value False
Show inapplicable models in preview, to be sure not missing models one could have used
show_inapplicable_transformers_preview
¶
show_inapplicable_transformers_preview (Boolean) (Expert Setting)
Default value False
Show inapplicable transformers in preview, to be sure not missing transformers one could have used
show_warnings_preview
¶
show_warnings_preview (Boolean) (Expert Setting)
Default value False
Show warnings for models (image auto, Dask multinode/multi-GPU) if conditions are met to use but not chosen to avoid missing models that could benefit accuracy/performance
show_warnings_preview_unused_map_features
¶
show_warnings_preview_unused_map_features (Boolean) (Expert Setting)
Default value True
Show warnings for models that have no transformers for certain features.
max_cols_show_unused_features
¶
max_cols_show_unused_features (Number) (Expert Setting)
Default value 1000
Up to how many input features to determine, during GUI/client preview, unused features. Too many slows preview down.
max_cols_show_feature_transformer_mapping
¶
max_cols_show_feature_transformer_mapping (Number) (Expert Setting)
Default value 1000
Up to how many input features to show transformers used for each input feature.
warning_unused_feature_show_max
¶
warning_unused_feature_show_max (Number) (Expert Setting)
Default value 3
Up to how many input features to show, in preview, that are unused features.
interaction_finder_max_rows_x_cols
¶
interaction_finder_max_rows_x_cols (Float) (Expert Setting)
Default value 200000.0
interaction_finder_corr_threshold
¶
interaction_finder_corr_threshold (Float) (Expert Setting)
Default value 0.95
min_bootstrap_samples
¶
Minimum number of bootstrap samples (Number) (Expert Setting)
Default value 1
Minimum number of bootstrap samples to use for estimating score and its standard deviation Actual number of bootstrap samples will vary between the min and max, depending upon row count (more rows, fewer samples) and accuracy settings (higher accuracy, more samples)
max_bootstrap_samples
¶
Maximum number of bootstrap samples (Number) (Expert Setting)
Default value 100
Maximum number of bootstrap samples to use for estimating score and its standard deviation Actual number of bootstrap samples will vary between the min and max, depending upon row count (more rows, fewer samples) and accuracy settings (higher accuracy, more samples)
min_bootstrap_sample_size_factor
¶
Minimum fraction of rows to use for bootstrap samples (Float) (Expert Setting)
Default value 1.0
Minimum fraction of row size to take as sample size for bootstrap estimator Actual sample size used for bootstrap estimate will vary between the min and max, depending upon row count (more rows, smaller sample size) and accuracy settings (higher accuracy, larger sample size)
max_bootstrap_sample_size_factor
¶
Maximum fraction of rows to use for bootstrap samples (Float) (Expert Setting)
Default value 10.0
Maximum fraction of row size to take as sample size for bootstrap estimator Actual sample size used for bootstrap estimate will vary between the min and max, depending upon row count (more rows, smaller sample size) and accuracy settings (higher accuracy, larger sample size)
bootstrap_final_seed
¶
Seed to use for final model bootstrap sampling (Number) (Expert Setting)
Default value -1
Seed to use for final model bootstrap sampling, -1 means use experiment-derived seed. E.g. one can retrain final model with different seed to get different final model error bars for scores.
benford_mad_threshold_int
¶
benford_mad_threshold_int (Float) (Expert Setting)
Default value 0.03
Benford’s law: mean absolute deviance threshold equal and above which integer valued columns are treated as categoricals too
benford_mad_threshold_real
¶
benford_mad_threshold_real (Float) (Expert Setting)
Default value 0.1
Benford’s law: mean absolute deviance threshold equal and above which real valued columns are treated as categoricals too
stabilize_features
¶
Use tuning-evolution search result for final model transformer. (Boolean) (Expert Setting)
Default value True
- Whether final pipeline uses fixed features for some transformers that would normally
perform search, such as InteractionsTransformer. Use what learned from tuning and evolution (True) or to freshly search for new features (False). This can give a more stable pipeline, especially for small data or when using interaction transformer as pretransformer in multi-layer pipeline.
fraction_std_bootstrap_ladder_factor
¶
Factor of standard deviation of bootstrap scores by which to accept new model in genetic algorithm. Too small a fraction will lead to accepting new models easily even if no significant improvement in score, while too large a fraction will reject too many good models. Non-zero value is a bit risky when no folds are used in GA, because bootstrap score is only rough estimate of error. (Float) (Expert Setting)
Default value 0.01
bootstrap_ladder_samples_limit
¶
Minimum number of bootstrap samples that are required to limit accepting new model. If less than this, then new model is always accepted. (Number) (Expert Setting)
Default value 10
rdelta_percent_score_penalty_per_feature_by_interpretability
¶
rdelta_percent_score_penalty_per_feature_by_interpretability (String) (Expert Setting)
Default value '{1: 0.0, 2: 0.1, 3: 1.0, 4: 2.0, 5: 5.0, 6: 10.0, 7: 20.0, 8: 30.0, 9: 50.0, 10: 100.0, 11: 100.0, 12: 100.0, 13: 100.0}'
drop_low_meta_weights
¶
drop_low_meta_weights (Boolean) (Expert Setting)
Default value True
meta_weight_allowed_by_interpretability
¶
meta_weight_allowed_by_interpretability (String) (Expert Setting)
Default value '{1: 1E-7, 2: 1E-5, 3: 1E-4, 4: 1E-3, 5: 1E-2, 6: 0.03, 7: 0.05, 8: 0.08, 9: 0.10, 10: 0.15, 11: 0.15, 12: 0.15, 13: 0.15}'
fs_data_vary_for_interpretability
¶
fs_data_vary_for_interpretability (Number) (Expert Setting)
Default value 7
fs_data_frac
¶
Fraction of data to use for another data slice for FS (Float) (Expert Setting)
Default value 0.5
round_up_indivs_for_busy_gpus
¶
Whether to round-up individuals to ensure all GPUs used. Not always best if (say) have 16 GPUs, better to have multiple experiments if in multi-user environment on single node. (Boolean) (Expert Setting)
Default value True
require_graphviz
¶
Whether to require Graphviz package at startup (Boolean) (Expert Setting)
Default value True
Graphviz is an optional requirement for native installations (RPM/DEP/Tar-SH, outside of Docker)to convert .dot files into .png files for pipeline visualizations as part of experiment artifacts
fast_approx_max_num_trees_ever
¶
fast_approx_max_num_trees_ever (Number) (Expert Setting)
Default value -1
Max. number of trees to use for all tree model predictions. For testing, when predictions don’t matter. -1 means disabled.
max_absolute_feature_expansion
¶
max_absolute_feature_expansion (Number) (Expert Setting)
Default value 1000
model_class_name_for_shift
¶
model_class_name_for_shift (String) (Expert Setting)
Default value 'auto'
model_class_name_for_leakage
¶
model_class_name_for_leakage (String) (Expert Setting)
Default value 'auto'
tensorflow_num_classes_switch_but_keep_lightgbm
¶
tensorflow_num_classes_switch_but_keep_lightgbm (Number) (Expert Setting)
Default value 15
textlin_num_classes_switch
¶
Class count above which do not use TextLin Transformer (Number) (Expert Setting)
Default value 5
Class count above which do not use TextLin Transformer.
text_gene_dim_reduction_choices
¶
text_gene_dim_reduction_choices (List) (Expert Setting)
Default value [50]
text_gene_max_ngram
¶
text_gene_max_ngram (List) (Expert Setting)
Default value [1, 2, 3]
number_of_texts_to_cache_in_bert_transformer
¶
number_of_texts_to_cache_in_bert_transformer (Number) (Expert Setting)
Default value -1
Enables caching of BERT embeddings by temporally saving the embedding vectors to the experiment directory. Set to -1 to cache all text, set to 0 to disable caching.
gbm_early_stopping_rounds_min
¶
gbm_early_stopping_rounds_min (Number) (Expert Setting)
Default value 1
gbm_early_stopping_rounds_max
¶
gbm_early_stopping_rounds_max (Number) (Expert Setting)
Default value 10000000000
max_num_varimp_to_log
¶
max_num_varimp_to_log (Number) (Expert Setting)
Default value 10
Max. number of top variable importances to show in logs during feature evolution
max_num_varimp_shift_to_log
¶
max_num_varimp_shift_to_log (Number) (Expert Setting)
Default value 10
Max. number of top variable importance shifts to show in logs and GUI after final model built
can_skip_final_upper_layer_failures
¶
can_skip_final_upper_layer_failures (Boolean) (Expert Setting)
Default value True
Whether can skip final model transformer failures for layer > first layer for multi-layer pipeline.
dump_modelparams_every_scored_indiv_feature_count
¶
dump_modelparams_every_scored_indiv_feature_count (Number) (Expert Setting)
Default value 3
Number of features to show in model dump every scored individual
dump_modelparams_every_scored_indiv_mutation_count
¶
dump_modelparams_every_scored_indiv_mutation_count (Number) (Expert Setting)
Default value 3
Number of past mutations to show in model dump every scored individual
dump_modelparams_separate_files
¶
dump_modelparams_separate_files (Boolean) (Expert Setting)
Default value False
Whether to append (false) or have separate files, files like: individual_scored_id%d.iter%d*params*, (true) for modelparams every scored indiv
oauth2_client_tokens_enabled
¶
oauth2_client_tokens_enabled (Boolean) (Expert Setting)
Default value False
Enables the option to initiate a PKCE flow from the UI in order to obtaintokens usable with Driverless clients
pdp_max_threads
¶
Maximum number of threads/forks for autoreport PDP. -1 means auto. (Number) (Expert Setting)
Default value -1
autoviz_max_num_columns
¶
Maximum number of column for Autoviz (Number) (Expert Setting)
Default value 50
- Maximum number of columns autoviz will work with.
If dataset has more columns than this number, autoviz will pick columns randomly, prioritizing numerical columns
autoviz_max_aggregated_rows
¶
Maximum number of rows in aggregated frame (Number) (Expert Setting)
Default value 500
enable_custom_recipes_from_url
¶
enable_custom_recipes_from_url (Boolean) (Expert Setting)
Default value True
Enable downloading of custom recipes from external URL.
enable_custom_recipes_from_zip
¶
enable_custom_recipes_from_zip (Boolean) (Expert Setting)
Default value True
Enable upload recipe files to be zip, containing custom recipe(s) in root folder, while any other code or auxiliary files must be in some sub-folder.
enable_recreate_custom_recipes_env
¶
enable_recreate_custom_recipes_env (Boolean) (Expert Setting)
Default value True
When set to true, it enable downloading custom recipes third party packages from the web, otherwise the python environment will be transferred from main worker.
include_custom_recipes_by_default
¶
include_custom_recipes_by_default (Boolean) (Expert Setting)
Default value False
Include custom recipes in default inclusion lists (warning: enables all custom recipes)
h2o_recipes_url
¶
h2o_recipes_url (String) (Expert Setting)
Default value 'None'
URL of H2O instance for use by transformers, models, or scorers.
h2o_recipes_ip
¶
h2o_recipes_ip (String) (Expert Setting)
Default value 'None'
IP of H2O instance for use by transformers, models, or scorers.
h2o_recipes_nthreads
¶
h2o_recipes_nthreads (Number) (Expert Setting)
Default value 8
Number of threads for H2O instance for use by transformers, models, or scorers. -1 for all.
h2o_recipes_log_level
¶
h2o_recipes_log_level (String) (Expert Setting)
Default value 'None'
Log Level of H2O instance for use by transformers, models, or scorers.
h2o_recipes_max_mem_size
¶
h2o_recipes_max_mem_size (String) (Expert Setting)
Default value 'None'
Maximum memory size of H2O instance for use by transformers, models, or scorers.
h2o_recipes_min_mem_size
¶
h2o_recipes_min_mem_size (String) (Expert Setting)
Default value 'None'
Minimum memory size of H2O instance for use by transformers, models, or scorers.
h2o_recipes_kwargs
¶
h2o_recipes_kwargs (Dict) (Expert Setting)
Default value {}
General user overrides of kwargs dict to pass to h2o.init() for recipe server.
h2o_recipes_start_trials
¶
h2o_recipes_start_trials (Number) (Expert Setting)
Default value 5
Number of trials to give h2o-3 recipe server to start.
h2o_recipes_start_sleep0
¶
h2o_recipes_start_sleep0 (Number) (Expert Setting)
Default value 1
Number of seconds to sleep before starting h2o-3 recipe server.
h2o_recipes_start_sleep
¶
h2o_recipes_start_sleep (Number) (Expert Setting)
Default value 5
Number of seconds to sleep between trials of starting h2o-3 recipe server.
custom_recipes_lock_to_git_repo
¶
custom_recipes_lock_to_git_repo (Boolean) (Expert Setting)
Default value False
- Lock source for recipes to a specific github repo.
If True then all custom recipes must come from the repo specified in setting: custom_recipes_git_repo
custom_recipes_git_repo
¶
custom_recipes_git_repo (String) (Expert Setting)
Default value 'https://github.com/h2oai/driverlessai-recipes'
If custom_recipes_lock_to_git_repo is set to True, only this repo can be used to pull recipes from
custom_recipes_git_branch
¶
custom_recipes_git_branch (String) (Expert Setting)
Default value 'None'
Branch constraint for recipe source repo. Any branch allowed if unset or None
custom_recipes_excluded_filenames_from_repo_download
¶
basenames of files to exclude from repo download (List) (Expert Setting)
Default value []
allow_old_recipes_use_datadir_as_data_directory
¶
Allow use of deprecated get_global_directory() method from custom recipes for backward compatibility of recipes created before 1.9.0. Disable to force separation of custom recipes per user (in which case user_dir() should be used instead). (Boolean) (Expert Setting)
Default value True
enable_custom_transformers
¶
enable_custom_transformers (Boolean) (Expert Setting)
Default value True
enable_custom_pretransformers
¶
enable_custom_pretransformers (Boolean) (Expert Setting)
Default value True
enable_custom_models
¶
enable_custom_models (Boolean) (Expert Setting)
Default value True
enable_custom_scorers
¶
enable_custom_scorers (Boolean) (Expert Setting)
Default value True
enable_custom_datas
¶
enable_custom_datas (Boolean) (Expert Setting)
Default value True
enable_custom_explainers
¶
enable_custom_explainers (Boolean) (Expert Setting)
Default value True
enable_custom_individuals
¶
enable_custom_individuals (Boolean) (Expert Setting)
Default value True
enable_connectors_recipes
¶
enable_connectors_recipes (Boolean) (Expert Setting)
Default value True
contrib_relative_directory
¶
Base directory for recipes within data directory. (String) (Expert Setting)
Default value 'contrib'
contrib_env_relative_directory
¶
contrib_env_relative_directory (String) (Expert Setting)
Default value 'contrib/env'
location of custom recipes packages installed (relative to data_directory) We will try to install packages dynamically, but can also do (before or after server started): (inside docker running docker instance if running docker, or as user server is running as (e.g. dai user) if deb/tar native installation: PYTHONPATH=<full tmp dir>/<contrib_env_relative_directory>/lib/python3.6/site-packages/ <path to dai>dai-env.sh python -m pip install –prefix=<full tmp dir>/<contrib_env_relative_directory> <packagename> –upgrade –upgrade-strategy only-if-needed –log-file pip_log_file.log where <path to dai> is /opt/h2oai/dai/ for native rpm/deb installation Note can also install wheel files if <packagename> is name of wheel file or archive.
pip_install_overall_retries
¶
pip_install_overall_retries (Number) (Expert Setting)
Default value 2
pip install retry for call to pip. Sometimes need to try twice
pip_install_verbosity
¶
pip_install_verbosity (Number) (Expert Setting)
Default value 2
pip install verbosity level (number of -v’s given to pip, up to 3
pip_install_timeout
¶
pip_install_timeout (Number) (Expert Setting)
Default value 15
pip install timeout in seconds, Sometimes internet issues would mean want to fail faster
pip_install_retries
¶
pip_install_retries (Number) (Expert Setting)
Default value 5
pip install retry count
pip_install_use_constraint
¶
pip_install_use_constraint (Boolean) (Expert Setting)
Default value True
Whether to use DAI constraint file to help pip handle versions. pip can make mistakes and try to install updated packages for no reason.
pip_install_options
¶
pip_install_options (List) (Expert Setting)
Default value []
pip install options: string of list of other options, e.g. [〈–proxy〉, 〈http://user:password@proxyserver:port〉]
enable_basic_acceptance_tests
¶
enable_basic_acceptance_tests (Boolean) (Expert Setting)
Default value True
Whether to enable basic acceptance testing. Tests if can pickle the state, etc.
enable_acceptance_tests
¶
enable_acceptance_tests (Boolean) (Expert Setting)
Default value True
Whether acceptance tests should run for custom genes / models / scorers / etc.
skip_disabled_recipes
¶
skip_disabled_recipes (Boolean) (Expert Setting)
Default value False
Whether to skip disabled recipes (True) or fail and show GUI message (False).
contrib_reload_and_recheck_server_start
¶
contrib_reload_and_recheck_server_start (Boolean) (Expert Setting)
Default value True
Whether to re-check recipes during server startup (if per_user_directories == false)
or during user login (if per_user_directories == true). If any inconsistency develops, the bad recipe will be removed during re-doing acceptance testing. This process can make start-up take alot longer for many recipes, but in LTS releases the risk of recipes becoming out of date is low. If set to false, will disable acceptance re-testing during sever start but note that previews or experiments may fail if those inconsistent recipes are used. Such inconsistencies can occur when API changes for recipes or more aggressive acceptance tests are performed.
contrib_install_packages_server_start
¶
contrib_install_packages_server_start (Boolean) (Expert Setting)
Default value True
Whether to at least install packages required for recipes during server startup (if per_user_directories == false)
or during user login (if per_user_directories == true). Important to keep True so any later use of recipes (that have global packages installed) will work.
contrib_reload_and_recheck_worker_tasks
¶
contrib_reload_and_recheck_worker_tasks (Boolean) (Expert Setting)
Default value False
- Whether to re-check recipes after uploaded from main server to worker in multinode.
Expensive for every task that has recipes to do this.
num_rows_acceptance_test_custom_transformer
¶
num_rows_acceptance_test_custom_transformer (Number) (Expert Setting)
Default value 200
num_rows_acceptance_test_custom_model
¶
num_rows_acceptance_test_custom_model (Number) (Expert Setting)
Default value 100
enable_mapr_multi_user_mode
¶
enable_mapr_multi_user_mode (Boolean) (Expert Setting)
Default value False
Enables the multi-user mode for MapR integration, which allows to have MapR ticket per user.
minio_secret_access_key
¶
Minio Secret Access Key (Any)
Default value ''
Minio Connector credentials
h2o_mli_nthreads
¶
h2o_mli_nthreads (Number) (Expert Setting)
Default value 8
Number of threads for H2O instance for use by MLI.
mli_pd_numcat_num_chart
¶
Unique feature values count driven Partial Dependence Plot binning and chart selection. (Boolean) (Expert Setting)
Default value True
Use dynamic switching between Partial Dependence Plot numeric and categorical binning and UI chart selection in case of features which were used both as numeric and categorical by experiment.
mli_pd_numcat_threshold
¶
Threshold for Partial Dependence Plot binning and chart selection (<=threshold categorical, >threshold numeric). (Number) (Expert Setting)
Default value 11
If 〈mli_pd_numcat_num_chart〉 is enabled, then use numeric binning and chart if feature unique values count is bigger than threshold, else use categorical binning and chart.
mli_run_kernel_explainer
¶
Use Kernel Explainer to obtain Shapley values for original features (Boolean) (Expert Setting)
Default value False
Use Kernel Explainer to obtain Shapley values for original features.
mli_kernel_explainer_sample
¶
Sample input dataset for Kernel Explainer (Boolean) (Expert Setting)
Default value True
Sample input dataset for Kernel Explainer.
mli_kernel_explainer_sample_size
¶
Sample size for input dataset passed to Kernel Explainer (Number) (Expert Setting)
Default value 1000
Sample size for input dataset passed to Kernel Explainer.
mli_kernel_explainer_nsamples
¶
Number of times to re-evaluate the model when explaining each prediction with Kernel Explainer. Default is determined internally (String) (Expert Setting)
Default value 'auto'
〈auto〉 or int. Number of times to re-evaluate the model when explaining each prediction. More samples lead to lower variance estimates of the SHAP values. The 〈auto〉 setting uses nsamples = 2 * X.shape[1] + 2048. This setting is disabled by default and DAI determines the right number internally.
mli_kernel_explainer_l1_reg
¶
L1 regularization for Kernel Explainer (String) (Expert Setting)
Default value 'aic'
〈num_features(int)〉, 〈auto〉 (default for now, but deprecated), 〈aic〉, 〈bic〉, or float. The l1 regularization to use for feature selection (the estimation procedure is based on a debiased lasso). The 〈auto〉 option currently uses aic when less that 20% of the possible sample space is enumerated, otherwise it uses no regularization. THE BEHAVIOR OF 〈auto〉 WILL CHANGE in a future version to be based on 〈num_features〉 instead of AIC. The aic and bic options use the AIC and BIC rules for regularization. Using 〈num_features(int)〉 selects a fix number of top features. Passing a float directly sets the alpha parameter of the sklearn.linear_model.Lasso model used for feature selection.
mli_kernel_explainer_max_runtime
¶
Max runtime for Kernel Explainer in seconds (Number) (Expert Setting)
Default value 900
Max runtime for Kernel Explainer in seconds. Default is 900, which equates to 15 minutes. Setting this parameter to -1 means to honor the Kernel Shapley sample size provided regardless of max runtime.
dask_cuda_cluster_kwargs
¶
Set dask CUDA/RAPIDS cluster settings for single node workers. (Dict) (Expert Setting)
Default value {'scheduler_port': 0, 'dashboard_address': ':0', 'protocol': 'tcp'}
Set dask CUDA/RAPIDS cluster settings for single node workers. Additional environment variables can be set, see: https://dask-cuda.readthedocs.io/en/latest/ucx.html#dask-scheduler e.g. for ucx use: {} dict version of: dict(n_workers=None, threads_per_worker=1, processes=True, memory_limit=〉auto〉, device_memory_limit=None, CUDA_VISIBLE_DEVICES=None, data=None, local_directory=None, protocol=〉ucx〉, enable_tcp_over_ucx=True, enable_infiniband=False, enable_nvlink=False, enable_rdmacm=False, ucx_net_devices=〉auto〉, rmm_pool_size=〉1GB〉) WARNING: Do not add arguments like {〈n_workers〉: 1, 〈processes〉: True, 〈threads_per_worker〉: 1} this will lead to hangs, cuda cluster handles this itself.
dask_cluster_kwargs
¶
Set dask cluster settings for single node workers. (Dict) (Expert Setting)
Default value {'n_workers': 1, 'processes': True, 'threads_per_worker': 1, 'scheduler_port': 0, 'dashboard_address': ':0', 'protocol': 'tcp'}
Set dask cluster settings for single node workers.
dask_scheduler_env
¶
Set dask scheduler env. (Dict) (Expert Setting)
Default value {}
Set dask scheduler env. See https://docs.dask.org/en/latest/setup/cli.html
dask_worker_env
¶
Set dask worker environment variables. NCCL_SOCKET_IFNAME is automatically set, but can be overridden here. (Dict) (Expert Setting)
Default value {'NCCL_P2P_DISABLE': '1', 'NCCL_DEBUG': 'WARN'}
Set dask worker env. See https://docs.dask.org/en/latest/setup/cli.html
dask_cuda_worker_env
¶
Set dask cuda worker environment variables. (Dict) (Expert Setting)
Default value {}
Set dask cuda worker env. See: https://dask-cuda.readthedocs.io/en/latest/ucx.html#launching-scheduler-workers-and-clients-separately
enable_imputation
¶
Enabling imputation adds new picker to EXPT setup GUI and triggers imputation functionality in Transformers ** (Boolean) (Expert Setting)
Default value
False
Enable column imputation
datatable_parse_max_memory_bytes
¶
datatable_parse_max_memory_bytes (Number) (Expert Setting)
Default value -1
Memory limit in bytes for datatable to use during parsing of CSV files. -1 for unlimited. 0 for automatic. >0 for constraint.
datatable_separator
¶
datatable_separator (String) (Expert Setting)
Default value ''
Delimiter/Separator to use when parsing tabular text files like CSV. Automatic if empty. Must be provided at system start.
ping_load_data_file
¶
Whether to enable ping of system status during DAI data ingestion. (Boolean) (Expert Setting)
Default value False
Whether to enable ping of system status during DAI data ingestion.
high_correlation_value_to_report
¶
Threshold for reporting high correlation (Float) (Expert Setting)
Default value 0.95
Value to report high correlation between original features
datatable_bom_csv
¶
datatable_bom_csv (Boolean) (Expert Setting)
Default value False
Include byte order mark (BOM) when writing CSV files. Required to support UTF-8 encoding in Excel.
check_invalid_config_toml_keys
¶
check_invalid_config_toml_keys (Boolean) (Expert Setting)
Default value True
Whether to check if config.toml keys are valid and fail if not valid
predict_safe_trials
¶
predict_safe_trials (Number) (Expert Setting)
Default value 2
fit_safe_trials
¶
fit_safe_trials (Number) (Expert Setting)
Default value 2
allow_no_pid_host
¶
Whether to allow no –pid=host setting. Some GPU info from within docker will not be correct. (Boolean) (Expert Setting)
Default value True
terminate_experiment_if_memory_low
¶
terminate_experiment_if_memory_low (Boolean) (Expert Setting)
Default value False
Whether to terminate experiments if the system memory available falls below memory_limit_gb_terminate
memory_limit_gb_terminate
¶
memory_limit_gb_terminate (Number) (Expert Setting)
Default value 5
Memory in GB beyond which will terminate experiment if terminate_experiment_if_memory_low=true.
last_exclusive_mode
¶
last_exclusive_mode (String) (Expert Setting)
Default value ''
Internal helper to allow memory of if changed exclusive mode
max_time_series_properties_sample_size
¶
max_time_series_properties_sample_size (Number) (Expert Setting)
Default value 250000
Max. sample size for automatic determination of time series train/valid split properties, only if time column is selected
max_lag_sizes
¶
max_lag_sizes (Number) (Expert Setting)
Default value 30
Maximum number of lag sizes to use for lags-based time-series experiments. are sampled from if sample_lag_sizes==True, else all are taken (-1 == automatic)
min_lag_autocorrelation
¶
min_lag_autocorrelation (Float) (Expert Setting)
Default value 0.1
Minimum required autocorrelation threshold for a lag to be considered for feature engineering
max_signal_lag_sizes
¶
max_signal_lag_sizes (Number) (Expert Setting)
Default value 100
How many samples of lag sizes to use for a single time group (single time series signal)
single_model_vs_cv_score_reldiff
¶
single_model_vs_cv_score_reldiff (Float) (Expert Setting)
Default value 0.05
single_model_vs_cv_score_reldiff2
¶
single_model_vs_cv_score_reldiff2 (Float) (Expert Setting)
Default value 0.0