dai-tutorial.Rmd
This vignette describes how to use the dai
package to use and control the Driverless AI platform. It covers the main predictive data-science workflow, i.e.:
Before we can start working with the Driverless AI platform, we have to import the package and initialize the connection:
After the connection has been established, you can create a new dataset:
creditcard <- dai.create_dataset(path_to_creditcard, progress = FALSE)
You can switch off the progress bar whenever displayed by a function of the package by setting progress = FALSE
. The progress bars can also be disabled altogether by setting the option dai.progress
:
options('dai.progress' = FALSE)
The function dai.create_dataset
loads the data located at the machine that hosts Driverless AI. If you wish to upload the data located at your workstation, use dai.upload_dataset
instead. If you already have the data loaded into R data.frame, you can simply convert it into DAIFrame this way:
iris_dai <- as.DAIFrame(iris)
print(iris_dai)
#> DAIFrame 'e78c6b94-3356-11ec-9574-3417ebc6f56b': 150 obs. of 5 variables
#> File path: h2oai/e78c6b94-3356-11ec-9574-3417ebc6f56b/iris2b0259c56951.csv.1634920889.520115.bin
Upon creation of the dataset, you can display the basic information and summary statistics by calling generics print
and summary
:
print(creditcard)
#> DAIFrame 'e5a3901e-3356-11ec-9574-3417ebc6f56b': 23999 obs. of 25 variables
#> File path: /Projects/h2oai/tests/smalldata/kaggle/CreditCard/creditcard_train_cat.csv
summary(creditcard)
#> ID LIMIT_BAL AGE
#> Min. : 1 Min. : 10000 Min. : 21
#> Mean : 12000 Mean :165498.7157798 Mean : 35.3808492
#> St.dev.: 6928.0588912 St.dev.:129130.7430653 St.dev.: 9.2710457
#> Max. : 23999 Max. : 1000000 Max. : 79
#> Count : 23999 Count : 23999 Count : 23999
#> Unique : 23999 Unique : 79 Unique : 55
#> PAY_1 PAY_2 PAY_3
#> Min. : -2 Min. : -2 Min. : -2
#> Mean : -0.0031251 Mean : -0.1234635 Mean : -0.1547564
#> St.dev.: 1.1234487 St.dev.: 1.2005912 St.dev.: 1.204058
#> Max. : 8 Max. : 8 Max. : 8
#> Count : 23999 Count : 23999 Count : 23999
#> Unique : 11 Unique : 11 Unique : 11
#> PAY_4 PAY_5 PAY_6
#> Min. : -2 Min. : -2 Min. : -2
#> Mean : -0.2116755 Mean : -0.2528855 Mean : -0.2780116
#> St.dev.: 1.1665728 St.dev.: 1.1370067 St.dev.: 1.1581916
#> Max. : 8 Max. : 8 Max. : 8
#> Count : 23999 Count : 23999 Count : 23999
#> Unique : 11 Unique : 10 Unique : 10
#> BILL_AMT1 BILL_AMT2 BILL_AMT3
#> Min. : -165580 Min. : -69777 Min. : -157264
#> Mean :50598.9286637 Mean :48648.0474186 Mean :46368.9035376
#> St.dev.:72650.1978093 St.dev.:70365.3956427 St.dev.:68194.7195203
#> Max. : 964511 Max. : 983931 Max. : 1664089
#> Count : 23999 Count : 23999 Count : 23999
#> Unique : 18717 Unique : 18367 Unique : 18131
#> BILL_AMT4 BILL_AMT5 BILL_AMT6
#> Min. : -170000 Min. : -81334 Min. : -339603
#> Mean : 42369.872828 Mean :40002.3330972 Mean :38565.2666361
#> St.dev.:63071.4551671 St.dev.:60345.7282797 St.dev.:59156.5011435
#> Max. : 891586 Max. : 927171 Max. : 961664
#> Count : 23999 Count : 23999 Count : 23999
#> Unique : 17719 Unique : 17284 Unique : 16906
#> PAY_AMT1 PAY_AMT2 PAY_AMT3
#> Min. : 0 Min. : 0 Min. : 0
#> Mean : 5543.0980458 Mean : 5815.528522 Mean : 4969.431393
#> St.dev.:15068.8627296 St.dev.:20797.4438849 St.dev.:16095.9292948
#> Max. : 505000 Max. : 1684259 Max. : 896040
#> Count : 23999 Count : 23999 Count : 23999
#> Unique : 6918 Unique : 6839 Unique : 6424
#> PAY_AMT4 PAY_AMT5 PAY_AMT6
#> Min. : 0 Min. : 0 Min. : 0
#> Mean : 4743.6568607 Mean : 4783.6436935 Mean : 5189.5736072
#> St.dev.: 14883.554872 St.dev.:15270.7039035 St.dev.:17630.7185745
#> Max. : 497000 Max. : 417990 Max. : 528666
#> Count : 23999 Count : 23999 Count : 23999
#> Unique : 6028 Unique : 5984 Unique : 5988
#> DEFAULT_PAYMENT_NEXT_MONTH SEX EDUCATION MARRIAGE
#> Min. : FALSE Count : 23999 Count : 23999 Count : 23999
#> Mean : 0.2237177 Unique : 2 Unique : 4 Unique : 4
#> St.dev.: 0.4167437 Top :female Top :university Top :single
#> Max. : TRUE Freq. : 8921 Freq. : 11360 Freq. : 12876
#> Count : 23999
#> Unique : 2
A couple of other generics work as usual on a DAIFrame: dim
, head
, or format
.
dim(creditcard)
#> [1] 23999 25
head(creditcard)
#> ID LIMIT_BAL SEX EDUCATION MARRIAGE AGE PAY_1 PAY_2 PAY_3 PAY_4 PAY_5
#> 1 1 20000 female university married 24 2 2 -1 -1 -2
#> 2 2 120000 female university single 26 -1 2 0 0 0
#> 3 3 90000 female university single 34 0 0 0 0 0
#> 4 4 50000 female university married 37 0 0 0 0 0
#> 5 5 50000 male university married 57 -1 0 -1 0 0
#> 6 6 50000 male graduate single 37 0 0 0 0 0
#> PAY_6 BILL_AMT1 BILL_AMT2 BILL_AMT3 BILL_AMT4 BILL_AMT5 BILL_AMT6 PAY_AMT1
#> 1 -2 3913 3102 689 0 0 0 0
#> 2 2 2682 1725 2682 3272 3455 3261 0
#> 3 0 29239 14027 13559 14331 14948 15549 1518
#> 4 0 46990 48233 49291 28314 28959 29547 2000
#> 5 0 8617 5670 35835 20940 19146 19131 2000
#> 6 0 64400 57069 57608 19394 19619 20024 2500
#> PAY_AMT2 PAY_AMT3 PAY_AMT4 PAY_AMT5 PAY_AMT6 DEFAULT_PAYMENT_NEXT_MONTH
#> 1 689 0 0 0 0 True
#> 2 1000 1000 1000 0 2000 True
#> 3 1500 1000 1000 1000 5000 False
#> 4 2019 1200 1100 1069 1000 False
#> 5 36681 10000 9000 689 679 False
#> 6 1815 657 1000 1000 800 False
A dataset can be split into e.g. training and test sets directly in R:
splits <- dai.split_dataset(creditcard,
output_name1 = 'train',
output_name2 = 'test',
ratio = .8,
seed = 25,
progress = FALSE)
In this case the splits
is a list with two elements with names ‘train’ and ‘test’, where 80% of the data went into train and 20% into test.
splits$train
#> DAIFrame 'e91e984c-3356-11ec-9574-3417ebc6f56b': 19199 obs. of 25 variables
#> File path: h2oai/e91e984c-3356-11ec-9574-3417ebc6f56b/train.1634920892.0027966.bin
splits$test
#> DAIFrame 'e91ef882-3356-11ec-9574-3417ebc6f56b': 4800 obs. of 25 variables
#> File path: h2oai/e91ef882-3356-11ec-9574-3417ebc6f56b/test.1634920892.0246837.bin
By default it yields a simple random sample, but you can do stratified or time-based splits as well. See the function’s documentation for more details.
One of the main strengths of Driverless AI is the fully automated feature engineering along with hyperparameter tuning, model selection and ensambling. The function dai.train
executes the experiment that results in a DAIModel instance representing the model.
model <- dai.train(training_frame = splits$train,
testing_frame = splits$test,
target_col = 'DEFAULT_PAYMENT_NEXT_MONTH',
is_classification = T,
is_timeseries = F,
accuracy = 1, time = 1, interpretability = 10,
seed = 25)
Driverless AI can suggest values for accuracy, time, and interpretability. (See dai.suggest_model_params
.) If you do not specify values for accuracy, time, or interpretability, then Driverless AI will use the recommended values.
As with DAIFrame, generic methods such as print
, format
, summary
, or predict
work with DAIModel:
print(model)
#> Experiment: lodehesa (eb7c76ea-3356-11ec-9574-3417ebc6f56b)
#> Version: 1.10.1+local_dev-placeholder-2113-g80efeacb3336-dirty, 2021-10-22 12:43
#> Settings: 1/1/10, seed=25, GPUs disabled
#> Train data: train (19199, 25)
#> Validation data: N/A
#> Test data: [Test] (4800, 24)
#> Target column: DEFAULT_PAYMENT_NEXT_MONTH (binary, 22.366% target class)
#> System specs: Linux, 31 GB, 8 CPU cores, 0/0 GPU
#> Max memory usage: 0.501 GB, 0 GB GPU
#> Recipe: AutoDL (5 iterations, 2 individuals)
#> Validation scheme: stratified, 1 internal holdout
#> Feature engineering: 23 features scored (21 selected)
#> Timing: MOJO latency 0.0250 millis (48.5kB), Python latency 64.4146 millis (40.0kB)
#> Data preparation: 13.92 secs
#> Shift/Leakage detection: 2.66 secs
#> Model and feature tuning: 25.75 secs (6 models trained)
#> Feature evolution: 1.77 secs (0 of 4 model trained)
#> Final pipeline training: 41.74 secs (7 models trained)
#> Python / MOJO scorer building: 31.33 secs / 14.32 secs
#> Validation score: AUC = 0.5 (constant preds of -1.244)
#> Validation score: AUC = 0.7739478 +/- 0.007726961 (baseline)
#> Validation score: AUC = 0.7681882 +/- 0.004076626 (final pipeline)
#> Test score: AUC = 0.7802183 +/- 0.008590495 (final pipeline)
summary(model)$score
#> [1] 0.7681882
summary(model)$score_f_name
#> [1] "AUC"
New data can be scored in two different ways:
predict
directly on the model in R session; orGeneric predict
either directly returns an R data.frame with the results (by default) or it returns a name of the file containing the predictions on the Driverless AI server (return_df=FALSE
). The latter option may be useful when you predict on a large dataset.
predictions <- predict(model, newdata = splits$test)
head(predictions)
#> DEFAULT_PAYMENT_NEXT_MONTH.0 DEFAULT_PAYMENT_NEXT_MONTH.1
#> 1 0.8509170 0.1490830
#> 2 0.8417784 0.1582216
#> 3 0.8704222 0.1295778
#> 4 0.4982968 0.5017032
#> 5 0.8486195 0.1513806
#> 6 0.8490370 0.1509630
preds_path <- predict(model, newdata = splits$test, return_df = FALSE)
print(preds_path)
#> [1] "h2oai/h2oai_experiment_eb7c76ea-3356-11ec-9574-3417ebc6f56b/eb7c76ea-3356-11ec-9574-3417ebc6f56b_preds_00778ad1.csv"
You can later download the file to your workstation:
dai.download_file(file_path = preds_path, dest_path = file.path(tempdir(), 'predictions.csv'), progress = FALSE)
#> [1] "/tmp/Rtmp0a0Czh/predictions.csv"
For productizing your model in Python or Java, you can download full Python or MOJO pipelines, respectively. For more information about how to use the pipelines please see the documentation.
dai.download_mojo(model, path = tempdir(), force = TRUE)
#> [1] "/tmp/Rtmp0a0Czh/mojo.zip"
dai.download_python_pipeline(model, path = tempdir(), force = TRUE)
#> [1] "/tmp/Rtmp0a0Czh/scorer.zip"
After some time, you may have multiple datasets and models on your Driverless AI server. The dai
package offers a few utility functions to find, reuse, and remove the existing datasets and models.
If you already have the dataset loaded into Driverless AI, you can get the DAIFrame object by either dai.get_frame
(if you know the frame’s key) or dai.find_dataset
(if you know the original path or at least a part of it):
dai.get_frame(creditcard$key)
#> DAIFrame 'e5a3901e-3356-11ec-9574-3417ebc6f56b': 23999 obs. of 25 variables
#> File path: /Projects/h2oai/tests/smalldata/kaggle/CreditCard/creditcard_train_cat.csv
dai.find_dataset('creditcard')
#> DAIFrame 'e5a3901e-3356-11ec-9574-3417ebc6f56b': 23999 obs. of 25 variables
#> File path: /Projects/h2oai/tests/smalldata/kaggle/CreditCard/creditcard_train_cat.csv
The latter directly returns the frame if there’s only one match. Otherwise it lets you select which frame to return from all the matching candidates.
Furthermore, you can get a list of datasets or models:
datasets <- dai.list_datasets()
head(datasets)
#> key name
#> 1 e91ef882-3356-11ec-9574-3417ebc6f56b test
#> 2 e91e984c-3356-11ec-9574-3417ebc6f56b train
#> 3 e78c6b94-3356-11ec-9574-3417ebc6f56b iris2b0259c56951.csv
#> 4 e5a3901e-3356-11ec-9574-3417ebc6f56b creditcard_train_cat.csv
#> file_path
#> 1 h2oai/e91ef882-3356-11ec-9574-3417ebc6f56b/test.1634920892.0246837.bin
#> 2 h2oai/e91e984c-3356-11ec-9574-3417ebc6f56b/train.1634920892.0027966.bin
#> 3 h2oai/e78c6b94-3356-11ec-9574-3417ebc6f56b/iris2b0259c56951.csv.1634920889.520115.bin
#> 4 /Projects/h2oai/tests/smalldata/kaggle/CreditCard/creditcard_train_cat.csv
#> file_size data_source row_count column_count import_status import_error
#> 1 567512 split 4800 25 0
#> 2 2265880 split 19199 25 0
#> 3 7152 upload 150 5 0
#> 4 2832344 file 23999 25 0
#> aggregation_status aggregation_error aggregated_frame mapping_frame uploaded
#> 1 -1 TRUE
#> 2 -1 TRUE
#> 3 -1 TRUE
#> 4 -1 FALSE
#> remote created logfile_path
#> 1 FALSE 1634920891 dataset_e91ef882-3356-11ec-9574-3417ebc6f56b.log
#> 2 FALSE 1634920891 dataset_e91e984c-3356-11ec-9574-3417ebc6f56b.log
#> 3 FALSE 1634920889 dataset_e78c6b94-3356-11ec-9574-3417ebc6f56b.log
#> 4 FALSE 1634920885 dataset_e5a3901e-3356-11ec-9574-3417ebc6f56b.log
models <- dai.list_models()
head(models)
#> key description
#> 1 eb7c76ea-3356-11ec-9574-3417ebc6f56b lodehesa
#> parameters.dataset.key parameters.dataset.display_name
#> 1 e91e984c-3356-11ec-9574-3417ebc6f56b train
#> parameters.resumed_model.key parameters.resumed_model.display_name
#> 1
#> parameters.target_col parameters.weight_col parameters.fold_col
#> 1 DEFAULT_PAYMENT_NEXT_MONTH
#> parameters.orig_time_col parameters.time_col parameters.is_classification
#> 1 [OFF] TRUE
#> parameters.cols_to_drop parameters.validset.key
#> 1 NA
#> parameters.validset.display_name parameters.testset.key
#> 1 e91ef882-3356-11ec-9574-3417ebc6f56b
#> parameters.testset.display_name parameters.enable_gpus parameters.seed
#> 1 test TRUE 25
#> parameters.accuracy parameters.time parameters.interpretability
#> 1 1 1 10
#> parameters.score_f_name parameters.time_groups_columns
#> 1 AUC NULL
#> parameters.unavailable_columns_at_prediction_time
#> 1 NULL
#> parameters.time_period_in_seconds parameters.num_prediction_periods
#> 1 0 0
#> parameters.num_gap_periods parameters.is_timeseries
#> 1 0 FALSE
#> parameters.cols_imputation
#> 1 NULL
#> parameters.config_overrides
#> 1 max_runtime_minutes = 1440\nmax_runtime_minutes_until_abort = 10080\ntime_abort = ""\ntime_abort_format = "%Y-%m-%d %H:%M:%S"\ntime_abort_timezone = "UTC"\ndelete_model_dirs_and_files = true\nrecipe = "auto"\nenable_genetic_algorithm = "auto"\nfeature_engineering_effort = -1\nfeature_engineering_effort_default = 5\ncheck_distribution_shift = "auto"\ncheck_distribution_shift_drop = "auto"\ndrop_features_distribution_shift_threshold_auc = 0.999\ncheck_leakage = "auto"\ndrop_features_leakage_threshold_auc = 0.999\nleakage_max_data_size = 10000000\nmax_features_importance = 100000\nmax_features_importance_debug = 1000\nmake_python_scoring_pipeline = "auto"\nmake_mojo_scoring_pipeline = "auto"\nmojo_for_predictions_tf = false\nmojo_for_predictions_torch = false\nmojo_for_predictions_text = false\nmojo_for_predictions_benchmark = false\nmojo_for_predictions_benchmark_slower_than_python_threshold = 10\nmojo_mini_acceptance_test_num_rows = 15\ninject_mojo_for_predictions = true\nmojo_for_predictions = "auto"\nmojo_for_predictions_max_rows = 10000\nmojo_for_predictions_batch_size = 100\nmojo_acceptance_test_rtol = 0.0\nmojo_acceptance_test_atol = 0.0\nreduce_mojo_size = false\nmake_pipeline_visualization = "auto"\nmake_python_pipeline_visualization = "auto"\nmake_autoreport = true\nmax_cols_make_autoreport_automatically = 1000\nmax_cols_make_pipeline_visualization_automatically = 5000\npass_env_to_deprecated_python_scoring = false\ntransformer_description_line_length = -1\nbenchmark_mojo_latency = "auto"\nbenchmark_mojo_latency_auto_size_limit = 500\nmojo_building_timeout = 1800.0\nmojo_building_parallelism = -1\nexclusive_mode = "safe"\nmax_workers = 1\nmax_cores = 4\nmax_cores_dai = -1\nstall_subprocess_submission_dai_fork_threshold_count = 0\nstall_subprocess_submission_mem_threshold_pct = 2\nmax_cores_by_physical = true\nmax_cores_limit = 100\nmax_fit_cores = 10\nmax_fit_cores_override = 0\nuse_dask_cluster = true\nmax_predict_cores = 0\nmax_predict_cores_in_dai = 4\nbatch_cpu_tuning_max_workers = 0\ncpu_max_workers = 0\nassumed_simultaneous_dt_forks_munging = 3\nassumed_simultaneous_dt_forks_stats_openblas = 1\nmax_max_dt_threads_munging = 4\nmax_max_dt_threads_stats_openblas = 8\nmax_max_dt_threads_readwrite = 4\nmin_dt_threads_munging = 1\nmin_dt_threads_final_munging = 1\nmax_dt_threads_munging = -1\nmax_dt_threads_readwrite = -1\nmax_dt_threads_stats_openblas = -1\nmax_dt_threads_do_timeseries_split_suggestion = 1\nnum_gpus_per_experiment = -1\nmin_num_cores_per_gpu = 2\nnum_gpus_per_model = 1\nnum_gpus_for_prediction = 0\ngpu_id_start = -1\nallow_reduce_features_when_failure = "auto"\nreduce_repeats_when_failure = 1\nfraction_anchor_reduce_features_when_failure = 0.1\nxgboost_reduce_on_errors_list = [ "Memory allocation error on worker", "out of memory", "XGBDefaultDeviceAllocatorImpl", "invalid configuration argument", "Requested memory",]\nlightgbm_reduce_on_errors_list = [ "Out of Host Memory",]\nkaggle_timeout = 120\nkaggle_keep_submission = false\nkaggle_competitions = ""\nmunging_report_period = 10\nworking_munging_report_period = 120\ntraining_report_period = 10\nworking_training_report_period = 60\nworking_munging_max_transformers = 100\nping_period = 60\nping_autodl = true\ndisk_limit_gb = 5\nstall_disk_limit_gb = 1\nmemory_limit_gb = 5\nmin_num_rows = 100\nmin_rows_per_class = 5\nmin_rows_per_split = 5\nreproducibility_level = 1\nseed = 1234\nmissing_values = [ "", "?", "None", "nan", "NA", "N/A", "unknown", "inf", "-inf", "1.7976931348623157e+308", "-1.7976931348623157e+308",]\nglm_nan_impute_training_data = false\nglm_nan_impute_validation_data = false\nglm_nan_impute_prediction_data = true\ntf_nan_impute_value = -5\nstatistical_threshold_data_size_small = 100000\nstatistical_threshold_data_size_large = 500000000\naux_threshold_data_size_large = 10000000\nset_method_sampling_row_limit = 5000000\nperformance_threshold_data_size_small = 100000\nperformance_threshold_data_size_large = 100000000\nmax_relative_cols_mismatch_allowed = 0.5\nenable_wide_rules = "auto"\nmax_cols = 10000000\nmax_rows_col_stats = 1000000\nmax_rows_cv_in_cv_gini = 100000\nmax_rows_constant_model = 1000000\nmax_rows_final_ensemble_base_model_fold_scores = 1000000\nmax_rows_final_blender = 1000000\nmax_rows_final_train_score = 5000000\nmax_rows_final_roccmconf = 1000000\nmax_rows_final_holdout_score = 5000000\nmax_rows_final_holdout_bootstrap_score = 1000000\norig_features_fs_report = false\nmax_rows_fs = 500000\nmax_rows_leak = 100000\nmax_workers_fs = 0\nmax_workers_shift_leak = 0\nmax_orig_cols_selected = 10000000\nmax_orig_numeric_cols_selected = 10000000\nmax_orig_nonnumeric_cols_selected = 300\nmax_orig_cols_selected_simple_factor = 2\nfs_orig_cols_selected = 10000000\nfs_orig_numeric_cols_selected = 10000000\nfs_orig_nonnumeric_cols_selected = 200\nfs_orig_cols_selected_simple_factor = 2\npredict_shuffle_inside_model = true\nuse_native_cats_for_lgbm_fs = true\norig_stddev_max_cols = 1000\nmax_relative_cardinality = 0.95\nmax_absolute_cardinality = 1000000\nnum_as_cat = true\nmax_int_as_cat_uniques = 50\nmax_int_as_cat_uniques_if_not_benford = 1000\nmax_fraction_invalid_numeric = -1.0\nnum_folds = 3\nfold_balancing_repeats_times_rows = 100000000.0\nmax_fold_balancing_repeats = 10\nfixed_split_seed = 0\nshow_fold_stats = true\nallow_different_classes_across_fold_splits = true\nfull_cv_accuracy_switch = 9\nensemble_accuracy_switch = 5\nnum_ensemble_folds = 4\nsave_validation_splits = false\nfold_reps = 1\nmax_num_classes_hard_limit = 10000\nmax_num_classes = 200\nmax_num_classes_compute_roc = 200\nmax_num_classes_client_and_gui = 10\nroc_reduce_type = "rows"\nmin_roc_sample_size = 1\nmax_rows_cm_ga = 500000\nnum_actuals_vs_predicted = 100\nuse_feature_brain_new_experiments = false\nfeature_brain_level = 0\nfeature_brain_reset_score = "auto"\nenable_strict_confict_key_check_for_brain = true\nallow_change_layer_count_brain = false\nbrain_maximum_diff_score = 0.1\nmax_num_brain_indivs = 3\nmax_num_brain_indivs_from_pop = 10000\nfeature_brain_save_every_iteration = 0\nwhich_iteration_brain = -1\nrefit_same_best_individual = false\nrestart_refit_redo_origfs_shift_leak = []\nbrain_rel_dir = "H2O.ai_brain"\nbrain_max_size_GB = 20\nbrain_add_features_for_new_columns = true\nforce_model_restart_to_defaults = true\nearly_stopping = true\nearly_stopping_per_individual = true\nearly_stopping_per_individual_spread_mutations = false\nmin_dai_iterations = 0\nnfeatures_max = -1\nngenes_max = -1\nlimit_features_by_interpretability = true\nenable_tensorflow_textcnn = "auto"\nenable_tensorflow_textbigru = "auto"\nenable_tensorflow_charcnn = "auto"\nenable_pytorch_nlp = "auto"\npytorch_nlp_pretrained_models = [ "bert-base-uncased", "distilbert-base-uncased", "bert-base-multilingual-cased",]\ntensorflow_max_epochs_nlp = 2\nenable_tensorflow_nlp_accuracy_switch = 5\ntensorflow_nlp_pretrained_embeddings_file_path = ""\ntensorflow_nlp_pretrained_embeddings_trainable = false\ntensorflow_nlp_have_gpus_in_production = false\ntensorflow_check_import = true\ntorch_check_import = true\ncupy_check_import = true\nbert_migration_timeout_secs = 600\nenable_bert_transformer_acceptance_test = false\nenable_bert_model_acceptance_test = false\npytorch_tokenizer_parallel = true\npytorch_nlp_fine_tuning_num_epochs = -1\npytorch_nlp_fine_tuning_batch_size = -1\npytorch_nlp_fine_tuning_padding_length = -1\npytorch_precision = 16\npytorch_bert_model_initial_lr = 2e-5\npytorch_bert_model_correct_bias = false\nenable_pytorch_tensorboard_logger = false\npytorch_nlp_pretrained_models_dir = ""\ntext_fraction_for_text_dominated_problem = 0.3\ntext_transformer_fraction_for_text_dominated_problem = 0.3\nstring_col_as_text_threshold = 0.3\nstring_col_as_text_min_relative_cardinality = 0.1\nstring_col_as_text_min_absolute_cardinality = 10000\nsupported_image_types = [ "jpg", "jpeg", "png", "bmp", "ppm", "tif", "tiff", "JPG", "JPEG", "PNG", "BMP", "PPM", "TIF", "TIFF",]\nimage_paths_absolute = false\nenable_tensorflow_image = "auto"\ntensorflow_image_pretrained_models = [ "xception",]\ntensorflow_image_vectorization_output_dimension = [ 100,]\ntensorflow_image_fine_tune = false\ntensorflow_image_fine_tuning_num_epochs = 2\ntensorflow_image_augmentations = [ "HorizontalFlip",]\ntensorflow_image_batch_size = -1\ntensorflow_image_pretrained_models_dir = "./pretrained/image/"\nimage_download_timeout = 60\nstring_col_as_image_max_missing_fraction = 0.1\nstring_col_as_image_min_valid_types_fraction = 0.8\ntensorflow_image_use_gpu = true\nmax_image_auto_ddp_cores = 10\ntext_dl_token_pad_percentile = 99\ntext_dl_token_pad_max = 512\nmonotonicity_constraints_interpretability_switch = 7\nmonotonicity_max_cols = 300\nmonotonicity_constraints_log_level = "medium"\nmonotonicity_constraints_correlation_threshold = 0.1\nmonotonicity_constraints_drop_low_correlation_features = false\ndefault_max_feature_interaction_depth = 8\nmax_feature_interaction_depth = -1\nfixed_feature_interaction_depth = 0\nfixed_feature_interaction_depth_many_inputs_generates_not_useful = 0\ntune_parameters_accuracy_switch = 3\ntune_target_transform_accuracy_switch = 5\ntarget_transformer = "auto"\ntarget_transformer_tuning_choices = [ "identity", "identity_noclip", "center", "standardize", "unit_box", "log", "square", "sqrt", "double_sqrt", "anscombe", "logit", "sigmoid",]\ntournament_style = "auto"\ntournament_uniform_style_interpretability_switch = 8\ntournament_uniform_style_accuracy_switch = 6\ntournament_model_style_accuracy_switch = 6\ntournament_feature_style_accuracy_switch = 13\ntournament_fullstack_style_accuracy_switch = 13\ntournament_use_feature_penalized_score = true\nnum_individuals = 2\nfixed_num_individuals = 0\nfixed_fold_reps = 0\nsanitize_natural_sort_limit = 1000\nenable_target_encoding = "auto"\ncvte_cv_in_cv = true\ncv_in_cv_overconfidence_protection = "auto"\nenable_lexilabel_encoding = "off"\nenable_isolation_forest = "off"\nenable_one_hot_encoding = "auto"\nisolation_forest_nestimators = 200\nincluded_transformers = [ "AutovizRecommendationsTransformer", "BERTTransformer", "CVCatNumEncodeTransformer", "CVTargetEncodeTransformer", "CatOriginalTransformer", "CatTransformer", "ClusterDistTransformer", "ClusterIdAllNumTransformer", "ClusterTETransformer", "DateOriginalTransformer", "DateTimeDiffTransformer", "DateTimeOriginalTransformer", "DatesTransformer", "EwmaLagsTransformer", "FrequentTransformer", "ImageOriginalTransformer", "ImageVectorizerTransformer", "InteractionsTransformer", "IsHolidayTransformer", "IsolationForestAnomalyAllNumericTransformer", "IsolationForestAnomalyNumCatAllColsTransformer", "IsolationForestAnomalyNumCatTransformer", "IsolationForestAnomalyNumericTransformer", "LagsAggregatesTransformer", "LagsInteractionTransformer", "LagsTransformer", "LexiLabelEncoderTransformer", "MeanTargetTransformer", "NumCatTETransformer", "NumToCatTETransformer", "NumToCatWoEMonotonicTransformer", "NumToCatWoETransformer", "OneHotEncodingTransformer", "OriginalTransformer", "RawTransformer", "StandardScalerTransformer", "StringConcatTransformer", "TextBiGRUTransformer", "TextCNNTransformer", "TextCharCNNTransformer", "TextLinModelTransformer", "TextOriginalTransformer", "TextTransformer", "TimeSeriesTargetEncTransformer", "TruncSVDAllNumTransformer", "TruncSVDNumTransformer", "WeightOfEvidenceTransformer",]\nexcluded_transformers = []\nexcluded_genes = []\nincluded_models = [ "Constant", "DecisionTree", "FTRL", "GLM", "ImbalancedLightGBM", "ImbalancedXGBoostGBM", "LightGBM", "LightGBMDask", "RuleFit", "TensorFlow", "TextALBERT", "TextBERT", "TextCamemBERT", "TextDistilBERT", "TextMultilingualBERT", "TextRoBERTa", "TextXLM", "TextXLMRoberta", "TextXLNET", "TorchGrowNet", "XGBoostDart", "XGBoostDartDask", "XGBoostGBM", "XGBoostGBMDask", "XGBoostRF", "XGBoostRFDask",]\nexcluded_models = []\nincluded_scorers = [ "ACCURACY", "AUC", "AUCPR", "F05", "F1", "F2", "FDR", "FNR", "FOR", "FPR", "GINI", "LOGLOSS", "MACROAUC", "MACROF1", "MACROMCC", "MCC", "NPV", "PRECISION", "RECALL", "TNR",]\nincluded_pretransformers = []\nexcluded_pretransformers = []\ninclude_all_as_pretransformers_if_none_selected = false\nforce_include_all_as_pretransformers_if_none_selected = false\nnum_pipeline_layers = 1\nincluded_datas = []\nexcluded_datas = []\nthreshold_scorer = "AUTO"\nexcluded_scorers = []\nenable_constant_model = "auto"\nenable_decision_tree = "auto"\nenable_glm = "auto"\nenable_glm_rapids = false\nenable_xgboost_gbm = "auto"\nenable_xgboost_rapids = false\nenable_rapids_cuml_models = false\nenable_rapids_models_dask = false\nuse_dask_for_1_gpu = false\ndask_retrials_allreduce_empty_issue = 5\nenable_xgboost_rf = "auto"\nenable_xgboost_gbm_dask = "auto"\nenable_lightgbm_dask = "auto"\nhyperopt_shift_leak = false\nhyperopt_shift_leak_per_column = false\nnum_inner_hyperopt_trials_prefinal = 0\nnum_inner_hyperopt_trials_final = 0\nnum_hyperopt_individuals_final = -1\noptuna_pruner = "MedianPruner"\noptuna_sampler = "TPESampler"\nenable_xgboost_hyperopt_callback = true\nenable_lightgbm_hyperopt_callback = true\nenable_xgboost_dart = "auto"\nenable_xgboost_dart_dask = "auto"\nenable_hyperopt_dask = "auto"\nnum_gpus_per_hyperopt_dask = -1\nuse_xgboost_xgbfi = false\nenable_lightgbm = "auto"\nenable_tensorflow = "auto"\nenable_grownet = "auto"\nenable_ftrl = "auto"\nenable_rulefit = "auto"\nenable_zero_inflated_models = "auto"\nenable_lightgbm_boosting_types = [ "gbdt",]\nenable_lightgbm_multiclass_balancing = "auto"\nenable_lightgbm_cat_support = false\nenable_lightgbm_linear_tree = false\nenable_lightgbm_extra_trees = false\nlightgbm_monotone_constraints_method = "intermediate"\nlightgbm_monotone_penalty = 0.0\nenable_lightgbm_cuda_support = false\nshow_constant_model = false\ndrop_constant_model_final_ensemble = true\nxgboost_rf_exact_threshold_num_rows_x_cols = 10000\nxgboost_reg_objectives = [ "reg:squarederror",]\nxgboost_reg_metrics = [ "rmse", "mae",]\nxgboost_binary_metrics = [ "logloss", "auc", "aucpr", "error",]\nlightgbm_reg_objectives = [ "mse", "mae",]\nlightgbm_reg_metrics = [ "rmse", "mse", "mae",]\nlightgbm_binary_objectives = [ "binary", "xentropy",]\nlightgbm_binary_metrics = [ "binary", "binary", "auc",]\nlightgbm_multi_metrics = [ "multiclass", "multi_error",]\ntweedie_variance_power_list = [ 1.5, 1.2, 1.9,]\nhuber_alpha_list = [ 0.9, 0.3, 0.5, 0.6, 0.7, 0.8, 0.1, 0.99,]\nfair_c_list = [ 1.0, 0.1, 0.5, 0.9,]\npoisson_max_delta_step_list = [ 0.7, 0.9, 0.5, 0.2,]\nquantile_alpha = [ 0.9, 0.95, 0.99, 0.6,]\nreg_lambda_glm_default = 0.0004\nlossguide_drop_factor = 4.0\nlossguide_max_depth_extend_factor = 8.0\nparams_tune_grow_policy_simple_trees = true\nmax_nestimators = 3000\nfixed_max_nestimators = -1\nn_estimators_list_no_early_stopping = [ 50, 100, 150, 200, 250, 300,]\nmin_learning_rate_final = 0.01\nmax_learning_rate_final = 0.05\nmax_nestimators_feature_evolution_factor = 0.2\nmin_learning_rate = 0.05\nmax_learning_rate = 0.5\ntune_learning_rate = false\nmax_epochs = 10\nmax_max_depth = 12\ndefault_max_bin = 256\ndefault_lightgbm_max_bin = 64\nmax_max_bin = 256\nmin_max_bin = 32\nscale_mem_for_max_bin = 10737418240\nfactor_rf = 1.25\ntensorflow_use_all_cores = true\ntensorflow_use_all_cores_even_if_reproducible_true = false\ntensorflow_disable_memory_optimization = true\ntensorflow_cores = 0\nrulefit_max_num_rules = -1\nrulefit_max_tree_depth = 6\nrulefit_max_num_trees = 100\none_hot_encoding_cardinality_threshold = 50\ntext_as_categorical_cardinality_threshold = 1000\none_hot_encoding_cardinality_limiter = true\nfixed_ensemble_level = -1\ncross_validate_single_final_model = true\nensemble_meta_learner = "blender"\ncross_validate_meta_learner = false\nparameter_tuning_num_models = -1\nvalidate_meta_learner = true\nvalidate_meta_learner_extra = false\nfixed_num_folds_evolution = -1\nfixed_num_folds = -1\nfixed_only_first_fold_model = "auto"\nnum_fold_ids_show = 10\nfold_scores_instability_warning_threshold = 0.25\nfeature_evolution_data_size = 300000000\nfinal_pipeline_data_size = 1000000000\nmax_validation_to_training_size_ratio_for_final_ensemble = 2.0\nforce_stratified_splits_for_imbalanced_threshold_binary = 0.01\nstratify_for_regression = true\nimbalance_sampling_method = "off"\nimbalance_sampling_threshold_min_rows_original = 100000\nimbalance_ratio_sampling_threshold = 5\nheavy_imbalance_ratio_sampling_threshold = 25\nimbalance_ratio_multiclass_threshold = 5\nheavy_imbalance_ratio_multiclass_threshold = 25\nimbalance_sampling_number_of_bags = -1\nimbalance_sampling_max_number_of_bags = 10\nimbalance_sampling_max_number_of_bags_feature_evolution = 3\nimbalance_sampling_max_multiple_data_size = 1.0\nimbalance_sampling_rank_averaging = "auto"\nimbalance_sampling_target_minority_fraction = -1.0\nimbalance_ratio_notification_threshold = 2.0\nnbins_ftrl_list = [ 1000000, 10000000, 100000000,]\nftrl_max_interaction_terms_per_degree = 10000\nte_bin_list = [ 25, 10, 100, 250,]\nwoe_bin_list = [ 25, 10, 100, 250,]\nohe_bin_list = [ 10, 25, 50, 75, 100,]\ndrop_redundant_columns_limit = 1000\ndrop_constant_columns = true\ndrop_id_columns = true\nno_drop_features = false\nsample_cols_to_group_by = false\nagg_funcs_for_group_by = [ "mean", "sd", "min", "max", "count",]\nfolds_for_group_by = 5\nmutation_mode = "sample"\ndefault_knob_offset_accuracy = 0\ndefault_knob_offset_time = 0\ndefault_knob_offset_interpretability = 0\nshift_check_text = false\nuse_rf_for_shift_if_have_lgbm = true\nshift_key_features_varimp = 0.01\nshift_check_reduced_features = true\nshift_trees = 100\nshift_max_bin = 256\nshift_min_max_depth = 4\nshift_max_max_depth = 8\ndetect_features_distribution_shift_threshold_auc = 0.55\ndrop_features_distribution_shift_min_features = 1\nshift_high_notification_level = 0.8\nleakage_check_text = true\nleakage_key_features_varimp = 0.001\nleakage_key_features_varimp_if_no_early_stopping = 0.05\nleakage_check_reduced_features = true\nuse_rf_for_leakage_if_have_lgbm = true\nleakage_trees = 100\nleakage_max_bin = 256\nleakage_min_max_depth = 6\nleakage_max_max_depth = 8\ndetect_features_leakage_threshold_auc = 0.95\ndetect_features_per_feature_leakage_threshold_auc = 0.8\ndrop_features_leakage_min_features = 1\nleakage_train_test_split = 0.25\ndetailed_traces = false\ndebug_log = false\ndebug_trace_lines = false\nlog_system_info_per_experiment = true\ncheck_system = true\nmax_debug_description_length = 1000\nabs_tol_for_perfect_score = 0.0001\ndata_ingest_timeout = 86400.0\ngpu_locking_type = "global"\ngpu_lock_data_size = 0\ngpu_lock_safety_factor = 3\ngpu_lock_delay = 10.0\ndbscand_frac_gpu_mem = 0.25\ngpu_locking_trust_pool_submission = true\ngpu_locking_free_dead = true\ngpu_lock_min_delay = 0.0\ndebug_gpu_lock = false\ndebug_gpu_lock2 = false\ntensorflow_allow_cpu_only = false\nfinal_vs_ga_progress_factor = 5\nbrain_inconsistent_asserts = false\ncheck_1_vs_N = false\ncheck_pred_contribs_sum = false\nlogit_value_unseen = -100\ndummy = 0\ndebug_ga = false\ndebug_indiv = false\nuniquify_indiv = true\ndebug_daimodel_level = 0\ndebug_debug_xgboost_splits = false\ndebug_pipeline_pickles = false\ndebug_ga_pickles = false\ntrace_lines_transformers = false\nallow_debug_context = false\nbrain_meta_info = false\nbrain_food_full = false\nmemory_ref = 137438953472\nrows_ref = 2000000\nmemory_score_factor_ref = 0.2\nuse_dummy_pool = false\nuse_dummy_pool_onetask = false\nuse_dummy_pool_detect_types = false\nuse_dummy_pool_capture_schema = false\nuse_dummy_pool_fs = false\nuse_dummy_pool_dl2 = false\nuse_dummy_pool_munging = false\nuse_dummy_pool_training = false\nuse_dummy_pool_xgb_fit = false\nuse_dummy_pool_lgbm_fit = false\nuse_dummy_pool_predict = false\nuse_dummy_pool_mutate = false\nuse_dummy_pool_check_indiv = false\nuse_dummy_pool_score = false\nuse_dummy_pool_final_munging = false\nuse_dummy_pool_final_training = false\nuse_dummy_pool_final_predict = false\nuse_dummy_pool_mojo = false\nforce_use_global_isolation_pool = false\nuse_global_isolation_pool = false\ndefault_global_isolation_pool = true\nlog_predict_info = true\nlog_fit_info = true\ncvte_debug_print = false\ncvte_average_folds_for_global_lookup = false\ncvte_alt = 0\ncvte_gini_tol = 0.01\ncvte_max_outer_cv = 5\ncvte_cv_in_cv_cvte = true\ncvte_cv_in_cv_woe = true\nstalled_time_nrows_ref = 1000000\nstalled_time_nrowscols_ref = 10000000\nstalled_time_ref = 240.0\nstalled_time_min = 120.0\nstalled_time_kill_ref = 440.0\nstalled_pool_cpu_percent_threshold_kill = 5.0\nstalled_pool_gpu_percent_threshold_kill = 5.0\nstalled_cpu_measure_interval = -1\nstalled_reference_high_threshold = 1000000\ncpu_percent_check_period_stalled = 0.3\ncpu_percent_check_period_gpulock_stalled = 1.0\ncpu_percent_per_experiment_check_period_logging = 0.1\ncpu_gpu_stall_max_count = 10\ncpu_stall_max_factor_interval = 20\nnum_cpu_sockets = 1\nnum_cpu_sockets_override = 0\nnum_gpus_override = -1\nshow_gpu_usage_only_if_locked = "auto"\nautodl_stall_sigusr1 = true\nautodl_stall_kill_segv = true\nshow_inapplicable_models_preview = false\nshow_inapplicable_transformers_preview = false\nshow_warnings_preview = false\nshow_warnings_preview_unused_map_features = true\nmax_cols_show_unused_features = 1000\nmax_cols_show_feature_transformer_mapping = 1000\nwarning_unused_feature_show_max = 3\nxgb_memory_pickled_estimate = false\ndebug_h2o4gpu_level = 0\nenable_h2o4gpu_kmeans = false\nenable_h2o4gpu_truncatedsvd = false\nxgboost_direct_datatable = false\ninteraction_finder_max_rows_x_cols = 200000.0\ninteraction_finder_search_limit = 20\ninteraction_finder_corr_threshold = 0.95\ninteraction_finder_max_pairwise_interactions = 100\ninteraction_finder_gini_rel_improvement_threshold = 0.5\ninteraction_finder_return_limit = 5\nenable_bootstrap = true\nmin_bootstrap_samples = 1\nmax_bootstrap_samples = 100\nmin_bootstrap_sample_size_factor = 1.0\nmax_bootstrap_sample_size_factor = 10.0\nbootstrap_final_seed = -1\ngpu_small_data_size = 100000\nmax_rows_tuning = 1000\nmax_rows_for_early_stopping_threshold_scale = 1000\nbenford_mad_threshold_int = 0.03\nbenford_mad_threshold_real = 0.1\nstop_early_rel_std = 0.1\nstop_early_abs_std = 0.001\nvarimp_threshold_at_interpretability_10 = 0.01\nstabilize_varimp = true\nstabilize_fs = true\nstabilize_features = true\nenable_rapids_transformers = false\nenable_rapids_transformers_dask = false\ncuml_shapley_as_imp_sample_size = 100\ncuml_shapley_as_imp_min_sample_size_threshold = 30\nfraction_std_bootstrap_ladder_factor = 0.01\nbootstrap_ladder_samples_limit = 10\nlowest_nonzero_varimp = 1e-30\nfeatures_allowed_by_interpretability = "{1: 10000000, 2: 10000, 3: 1000, 4: 500, 5: 300, 6: 200, 7: 150, 8: 100, 9: 80, 10: 50, 11: 50, 12: 50, 13: 50}"\nnfeatures_max_threshold = 200\nrdelta_percent_score_penalty_per_feature_by_interpretability = "{1: 0.0, 2: 0.1, 3: 1.0, 4: 2.0, 5: 5.0, 6: 10.0, 7: 20.0, 8: 30.0, 9: 50.0, 10: 100.0, 11: 100.0, 12: 100.0, 13: 100.0}"\ndrop_low_meta_weights = true\nmeta_weight_allowed_by_interpretability = "{1: 1E-7, 2: 1E-5, 3: 1E-4, 4: 1E-3, 5: 1E-2, 6: 0.03, 7: 0.05, 8: 0.08, 9: 0.10, 10: 0.15, 11: 0.15, 12: 0.15, 13: 0.15}"\nmeta_weight_allowed_for_reference = 1.0\nfeature_cost_mean_interp_for_penalty = 5\nfeatures_cost_per_interp = 0.25\nvarimp_threshold_shift_report = 0.3\napply_featuregene_limits_after_tuning = true\nremove_scored_0gain_genes_in_postprocessing_above_interpretability = 13\nremove_scored_0gain_genes_in_postprocessing_above_interpretability_final_population = 2\nremove_scored_by_threshold_genes_in_postprocessing_above_interpretability_final_population = 7\nmerge_dup_raw_features = true\nremove_dup_genes = true\nshow_full_pipeline_details = false\nnum_transformed_features_per_pipeline_show = 10\nfs_interpretabilty_switch = 7\nfs_prune_by_genes = false\nfs_data_vary_for_interpretability = 7\nfs_data_frac = 0.5\nmany_columns_count = 400\ncolumns_count_interpretable = 200\nvarimp_fspermute_factor = 1.0\nngenes_min = -1\nnfeatures_min = -1\nfeatures_per_gene = 1\nnfeatures_max_factor = 1.0\nuse_forkdatacontext = true\nstrict_gpu_non_overlap = false\nround_up_indivs_for_busy_gpus = true\nterminate_train_backend_tuning = true\nterminate_train_tuning = true\nterminate_train_feature_evolution = true\nkeep_fraction_default = 1.0\nreduce_by_fraction = 0.1\nreduce_count_max = 10\npreserve_oom_reduced_features = true\npreserve_varimp_reduced_features = false\npreserve_varimp_reduced_features_for_rescored = true\npreserve_reduced_features_for_final_model = true\nmutation_rate = -1\nextra_mutation_level = -1\ntuning_share_varimp_accuracy_switch = 5\ntuning_share_varimp = "best"\nfresh_indiv_accuracy_switch = 7\nenable_tensorflow_import = true\nenable_torch_import = true\ndetailed_tensorflow_import_error = false\nenable_amp_for_speed = false\nenable_tf32_for_speed = false\nenable_lightgbm_import = true\ncheck_timeout_per_gpu = 20\ncheck_gpu_trials = 3\ngpu_exit_if_fails = true\ncheck_java_timeout = 30\ncheck_graphviz_timeout = 60\nrequire_graphviz = true\nscorers_need_x = false\nprob_add_genes = 0.5\nprob_addbest_genes = 0.5\nprob_prune_genes = 0.5\nprob_perturb_xgb = 0.25\nprob_prune_by_features = 0.25\nfast_approx_max_num_trees_ever = 5\nfast_approx_num_trees = 250\nfast_approx_do_one_fold = true\nfast_approx_do_one_model = false\nfast_approx_contribs_num_trees = 50\nfast_approx_contribs_do_one_fold = true\nfast_approx_contribs_do_one_model = true\nperturb_xgb_depth_random = false\nperturb_xgb_depth_min = 3\nperturb_xgb_depth_max = 10\nuse_187_prob_logic = true\nexplore_more_unused_genes = true\nexplore_gene_anneal = true\nexplore_prob0 = 0.5\nexplore_anneal_factor = 0.9\nexplore_prob_lowest = 0.1\nexplore_grow_anneal = true\ngrow_prob0 = 0.8\ngrow_anneal_factor = 0.5\ngrow_prob_lowest = 0.05\ngrow_proboff = 0.5\nprob_tune_model_vs_features = 0.5\nmax_absolute_feature_expansion = 1000\nexplore_model_anneal = true\nexplore_model_prob0 = 0.5\nexplore_model_anneal_factor = 0.9\nexplore_model_prob_lowest = 0.1\nxgboost_interpretability_switch = 10\nxgboost_accuracy_switch = 1\nbooster_for_fs_permute = "auto"\nmodel_class_name_for_fs_permute = "auto"\nswitch_from_tree_to_lgbm_if_can = true\ndefault_booster = "lightgbm"\ndefault_model_class_name = "LightGBMModel"\nthreshold_data_size_large_to_use_cpu_for_fs = 100000000\nlightgbm_interpretability_switch = 10\nlightgbm_accuracy_switch = 1\ndecision_tree_interpretability_switch = 7\ndecision_tree_accuracy_switch = 7\ntensorflow_interpretability_switch = 6\ntensorflow_accuracy_switch = 5\ntensorflow_num_classes_switch = 10\nprediction_intervals = true\nprediction_intervals_alpha = 0.9\npred_labels = true\ntensorflow_num_classes_switch_but_keep_lightgbm = 15\ntextlin_num_classes_switch = 5\ntext_gene_dim_reduction_choices = [ 50,]\ntext_gene_max_ngram = [ 1, 2,]\ntext_transformers_max_vocabulary_size = [ 1000, 5000,]\nnumber_of_texts_to_cache_in_bert_transformer = -1\nmax_abs_score_delta_train_valid = 0.0\nmax_rel_score_delta_train_valid = 0.0\nglm_do_lambda_search = true\nglm_do_lambda_search_by_eval_metric = false\nglm_lambda_early_stopping_rounds = 4\ngbm_early_stopping_rounds_min = 1\ngbm_early_stopping_rounds_max = 10000000000\nenable_early_stopping_threshold = true\nglm_optimal_refit = false\nglm_interpretability_switch = 6\nglm_accuracy_switch = 5\nfixup_nanpreds = true\nfixup_infX = true\nrulefit_interpretability_switch = 1\nrulefit_accuracy_switch = 8\nenable_cache_final_pipeline = false\nmax_varimp_to_save = 100\nmax_num_varimp_to_log = 10\nmax_num_varimp_shift_to_log = 10\nallow_any_call_to_skip_failures = true\nskip_transformer_failures = true\nskip_model_failures = true\ncan_skip_final_upper_layer_failures = true\ndetailed_skip_failure_messages_level = 1\nnotify_failures = true\nsize_search_stack = 100\nsize_sub_stack = 100\nconfig_overrides = ""\ndump_varimp_every_scored_indiv = false\ndump_modelparams_every_scored_indiv = true\ndump_modelparams_every_scored_indiv_feature_count = 3\ndump_modelparams_every_scored_indiv_mutation_count = 3\ndump_modelparams_separate_files = false\ndump_trans_timings = false\ndelete_preview_trans_timings = true\nunsupervised_clustering_min_clusters = 2\nunsupervised_clustering_max_clusters = 10\nuse_random_text_file = false\nruntime_estimation_train_frame = ""\nenable_bad_scorer = false\ndebug_col_dict_prefix = ""\nreturn_early_debug_col_dict_prefix = false\npreview_error_prefix = "Experiment Setup Error"\ntesting_coldict_error_msg = "TESTING_COL_DICT"\nauthentication_method = "unvalidated"\nadditional_authentication_methods = []\nauthentication_default_timeout_hours = 72.0\nauthentication_gui_polling_prolongs_session = false\nauth_openid_provider_base_uri = ""\nauth_openid_configuration_uri = ""\nauth_openid_auth_uri = ""\nauth_openid_token_uri = ""\nauth_openid_userinfo_uri = ""\nauth_openid_logout_uri = ""\nauth_openid_redirect_uri = ""\nauth_openid_grant_type = ""\nauth_openid_response_type = ""\nauth_openid_scope = ""\nauth_openid_urlencode_quote_via = "quote"\nauth_openid_access_token_expiry_key = "expires_in"\nauth_openid_refresh_token_expiry_key = "refresh_expires_in"\nauth_openid_token_expiration_secs = 3600\nauth_openid_use_objectpath_match = false\nauth_openid_use_objectpath_expression = ""\nauth_openid_token_introspection_url = ""\nauth_openid_end_session_endpoint_url = ""\nauth_openid_default_scopes = ""\napi_token_introspection_enabled = false\napi_token_introspection_method = "OAUTH2_TOKEN_INTROSPECTION"\napi_token_oauth2_scopes = ""\napi_token_oauth2_username_field_name = "username"\noauth2_client_tokens_enabled = false\noauth2_client_tokens_client_id = ""\noauth2_client_tokens_authorize_url = ""\noauth2_client_tokens_token_url = ""\noauth2_client_tokens_introspection_url = ""\noauth2_client_tokens_redirect_url = ""\noauth2_client_tokens_scope = "openid profile ai.h2o.storage"\nldap_server = ""\nldap_port = ""\nldap_bind_dn = ""\nldap_tls_file = ""\nldap_use_ssl = false\nldap_search_base = ""\nldap_search_filter = ""\nldap_search_attributes = ""\nldap_user_name_attribute = ""\nldap_recipe = "0"\nldap_user_prefix = ""\nldap_search_user_id = ""\nldap_ou_dn = ""\nldap_dc = ""\nldap_base_dn = ""\nldap_base_filter = ""\nauth_tls_crl_file = ""\nauth_tls_subject_field = "CN"\nauth_tls_field_parse_regexp = "(?P<username>.*)"\nauth_tls_user_lookup = "REGEXP_ONLY"\nauth_tls_ldap_server = ""\nauth_tls_ldap_port = ""\nauth_tls_ldap_use_ssl = false\nauth_tls_ldap_tls_file = ""\nauth_tls_ldap_bind_dn = ""\nauth_tls_ldap_search_base = ""\nauth_tls_ldap_search_filter = ""\nauth_tls_ldap_username_attribute = ""\nauth_tls_ldap_authorization_lookup_filter = ""\nauth_tls_ldap_authorization_search_base = ""\nauth_jwt_token_source = "HEADER"\nauth_jwt_cookie_name = ""\nauth_jwt_header_name = ""\nauth_jwt_source_parse_regexp = "(?P<token>.*)"\nauth_jwt_username_claim_name = "sub"\nauth_jwt_verify = true\nauth_jwt_algorithm = "HS256"\nauth_jwt_exp_leeway_seconds = 0\nauth_jwt_required_audience = ""\nauth_jwt_required_issuer = ""\nautodoc_report_name = "report"\nautodoc_template = ""\nautodoc_additional_template_folder = ""\nautodoc_output_type = "docx"\nautodoc_subtemplate_type = "auto"\nautodoc_max_cm_size = 10\nautodoc_num_features = 50\nautodoc_min_relative_importance = 0.003\nautodoc_include_permutation_feature_importance = false\nautodoc_feature_importance_num_perm = 1\nautodoc_feature_importance_scorer = ""\nautodoc_pd_max_rows = 10000\nautodoc_pd_max_runtime = 45\nautodoc_pd_fast_approx = true\nautodoc_out_of_range = 3\nautodoc_num_rows = 0\nautodoc_population_stability_index = false\nautodoc_population_stability_index_n_quantiles = 10\nautodoc_prediction_stats = false\nautodoc_prediction_stats_n_quantiles = 20\nautodoc_response_rate = false\nautodoc_response_rate_n_quantiles = 10\nautodoc_gini_plot = false\nautodoc_enable_shapley_values = true\nautodoc_global_klime_num_features = 10\nautodoc_global_klime_num_tables = 1\nautodoc_data_summary_col_num = -1\nautodoc_list_all_config_settings = false\nautodoc_keras_summary_line_length = -1\nautodoc_transformer_architecture_max_lines = 30\nautodoc_full_architecture_in_appendix = false\nautodoc_coef_table_appendix_results_table = false\nautodoc_coef_table_num_models = 1\nautodoc_coef_table_num_folds = -1\nautodoc_coef_table_num_coef = 50\nautodoc_coef_table_num_classes = 9\nautodoc_num_histogram_plots = 10\npdp_max_threads = -1\nvis_server_ip = "127.0.0.1"\nvis_server_port = 12346\nautoviz_max_num_columns = 50\nautoviz_max_aggregated_rows = 500\nautoviz_enable_recommendations = true\nautoviz_enable_transformer_acceptance_tests = false\nenable_custom_recipes = true\nenable_custom_recipes_upload = true\nenable_custom_recipes_from_url = true\nenable_custom_recipes_from_zip = true\nmust_have_custom_transformers = false\nmust_have_custom_transformers_2 = false\nmust_have_custom_transformers_3 = false\nmust_have_custom_models = false\nmust_have_custom_scorers = false\nenable_recreate_custom_recipes_env = true\nextra_migration_custom_recipes_missing_modules = false\nget_url_data_sleep = 5\nget_url_data_trials = 5\ninclude_custom_recipes_by_default = false\nforce_include_custom_recipes_by_default = false\nenable_h2o_recipes = true\nh2o_recipes_url = "None"\nh2o_recipes_ip = "None"\nh2o_recipes_nthreads = 8\nh2o_recipes_log_level = "None"\nh2o_recipes_max_mem_size = "None"\nh2o_recipes_min_mem_size = "None"\nh2o_recipes_start_trials = 5\nh2o_recipes_start_sleep0 = 1\nh2o_recipes_start_sleep = 5\ncustom_recipes_lock_to_git_repo = false\ncustom_recipes_git_repo = "https://github.com/h2oai/driverlessai-recipes"\ncustom_recipes_git_branch = "None"\ncustom_recipes_excluded_filenames_from_repo_download = []\nallow_old_recipes_use_datadir_as_data_directory = true\nlast_recipe = "auto"\nrecipes_temporary_data_directory = "recipe_tmp"\nenable_custom_transformers = true\nenable_custom_pretransformers = true\nenable_custom_models = true\nenable_custom_scorers = true\nenable_custom_datas = true\nenable_custom_explainers = true\nrecipe_load_raise_on_first_error = false\nrecipe_load_raise_on_first_error_but_keep_accepted_recipes = false\nrecipe_load_raise_on_any_error = true\nraise_on_invalid_included_list = false\nwrite_recipes_to_experiment_folder = true\nwrite_recipes_to_experiment_logger = false\nprint_recipe_code_once_accepted = true\ncontrib_relative_directory = "contrib"\ncontrib_env_relative_directory = "contrib/env"\nignore_package_version = []\nallow_version_change_user_packages = false\npip_install_overall_retries = 2\npip_install_verbosity = 2\npip_install_timeout = 15\npip_install_retries = 5\npip_install_use_constraint = true\npip_install_options = []\nenable_basic_acceptance_tests = true\nenable_acceptance_tests = false\nskip_disabled_recipes = false\nacceptance_test_timeout = 20.0\nacceptance_tests_max_number_of_parameter_combinations = 20\ncontrib_reload_and_recheck_server_start = true\ncontrib_install_packages_server_start = true\nrecipe_test_import_after_load_package = false\ndata_recipe_preview_num_rows = 20\ndata_recipe_isolate = true\ndebug_custom = false\nserver_recipe_url = ""\nnum_rows_acceptance_test_custom_transformer = 200\nnum_rows_acceptance_test_custom_model = 100\nbenchmark_scorers = false\nnum_rows_acceptance_test_scorer = 1000000\nnum_cols_acceptance_test_scorer = 100\nenabled_file_systems = [ "upload", "file", "hdfs", "s3", "recipe_file", "recipe_url",]\nmax_files_listed = 100\nfile_hide_data_directory = true\nfile_path_filtering_enabled = false\nfile_path_filter_include = []\nhdfs_auth_type = "noauth"\nhdfs_app_principal_user = ""\nhdfs_app_login_user = ""\nhdfs_app_jvm_args = ""\nhdfs_app_classpath = ""\nhdfs_app_supported_schemes = [ "hdfs://", "maprfs://", "swift://",]\nhdfs_max_files_listed = 100\nhdfs_init_path = "hdfs://"\nenable_mapr_multi_user_mode = false\ndtap_auth_type = "noauth"\ndtap_config_path = ""\ndtap_key_tab_path = ""\ndtap_keytab_path = ""\ndtap_app_principal_user = ""\ndtap_app_login_user = ""\ndtap_app_jvm_args = ""\ndtap_app_classpath = ""\ndtap_init_path = "dtap://"\naws_role_arn = ""\naws_default_region = ""\naws_s3_endpoint_url = ""\naws_use_ec2_role_credentials = false\ns3_init_path = "s3://"\ns3_skip_cert_verification = false\ns3_connector_cert_location = ""\ngcs_init_path = "gs://"\nminio_skip_cert_verification = false\nminio_connector_cert_location = ""\nminio_init_path = "/"\nsnowflake_allow_stages = true\nsnowflake_batch_size = 10000\nkdb_hostname = ""\nkdb_port = ""\nkdb_app_classpath = ""\nkdb_app_jvm_args = ""\nazure_blob_init_path = "https://"\nazure_blob_use_access_token = false\nazure_blob_use_access_token_scopes = "https://storage.azure.com/.default"\nazure_blob_use_access_token_source = "SESSION"\nazure_blob_keycloak_aad_client_id = ""\nazure_blob_keycloak_aad_auth_uri = ""\nazure_blob_keycloak_broker_token_endpoint = ""\nazure_enable_token_auth_aad = false\nazure_ad_client_id = ""\nazure_ad_auth_uri = ""\nazure_ad_scopes = []\nazure_keycloak_idp_token_endpoint = ""\njdbc_app_configs = "{}"\njdbc_app_jvm_args = "-Xmx4g"\njdbc_app_classpath = ""\nhive_app_configs = "{}"\nhive_app_jvm_args = "-Xmx4g"\nhive_app_classpath = ""\nenable_artifacts_upload = false\nartifacts_store = "file_system"\nbitbucket_skip_cert_verification = false\nbitbucket_tmp_relative_dir = "local_git_tmp"\nartifacts_file_system_directory = "tmp"\nartifacts_s3_bucket = ""\nartifacts_git_user = "git"\nartifacts_git_repo = ""\nartifacts_git_branch = "dev"\nartifacts_git_ssh_private_key_file_location = ""\ndeployment_aws_bucket_name = ""\nallow_config_overrides_in_expert_page = true\nmax_cols_log_headtail = 1000\nmax_cols_gui_headtail = 1000\nsupported_file_types = [ "csv", "tsv", "txt", "dat", "tgz", "gz", "bz2", "zip", "xz", "xls", "xlsx", "jay", "feather", "bin", "arff", "parquet", "pkl", "orc", "avro",]\nrecipe_supported_file_types = [ "py", "pyc", "zip",]\nlist_files_without_extensions = false\nallow_localstorage = true\nallow_orig_cols_in_predictions = true\nallow_form_autocomplete = true\nenable_projects = true\napp_language = "en"\ndisablelogout = false\npython_client_path = ""\npython_client_url = "https://s3.amazonaws.com/artifacts.h2o.ai/releases/ai/h2o/driverlessai/latest_release_whl.html"\npython_client_verify_integrity = true\ngui_require_experiment_name = false\ngui_enable_deploy_button = true\nenable_gui_product_tour = true\nenable_dataset_downloading = true\nenable_experiment_export = true\nenable_experiment_import = true\nshow_all_filesystems = false\nenable_license_manager = false\nlicense_manager_address = "http://127.0.0.1:9999"\nlicense_manager_project_name = "default"\nlicense_manager_lease_duration = 3600000\nlicense_manager_worker_lease_duration = 21600000\nlicense_manager_ssl_certs = "true"\nlicense_manager_worker_startup_timeout = 3600000\nlicense_manager_dry_run_token = ""\nmli_lime_method = "k-LIME"\nmli_use_raw_features = true\nmli_sample = true\nmli_vars_to_pdp = 10\nmli_nfolds = 3\nmli_qbin_count = 0\nh2o_mli_nthreads = 4\nmli_enable_mojo_scorer = true\nmli_sample_above_for_scoring = 1000000\nmli_sample_above_for_training = 100000\nmli_sample_size = 100000\nmli_num_quantiles = 10\nmli_drf_num_trees = 100\nmli_fast_approx = true\nmli_interpreter_status_cache_size = 1000\nmli_drf_max_depth = 20\nmli_sample_training = true\nklime_lambda = [ 1e-6, 1e-8,]\nklime_alpha = 0.0\nmli_max_numeric_enum_cardinality = 25\nmli_max_number_cluster_vars = 6\nuse_all_columns_klime_kmeans = false\nmli_strict_version_check = true\nmli_ice_per_bin_strategy = false\nmli_dia_default_max_cardinality = 10\nmli_dia_default_min_cardinality = 2\nmli_shapley_sample_size = 100000\nenable_mli_keeper = true\nenable_mli_sa = true\nenable_mli_priority_queues = true\nmli_sequential_task_execution = true\nmli_dia_sample_size = 100000\nmli_pd_sample_size = 25000\nmli_pd_numcat_num_chart = true\nmli_pd_numcat_threshold = 11\nnew_mli_list_only_explainable_datasets = false\nenable_mli_async_api = true\nenable_mli_sa_main_chart_aggregator = true\nmli_sa_sampling_limit = 500000\nmli_sa_main_chart_aggregator_limit = 1000\nmli_predict_safe = false\nmli_max_surrogate_retries = 5\nenable_mli_symlinks = true\nh2o_mli_fraction_memory = 0.45\nmli_custom = ""\nexcluded_mli_explainers = []\nenable_ws_perfmon = false\nmli_kernel_explainer_workers = 4\nmli_run_kernel_explainer = false\nmli_kernel_explainer_sample = true\nmli_kernel_explainer_sample_size = 1000\nmli_kernel_explainer_nsamples = "auto"\nmli_kernel_explainer_l1_reg = "aic"\nmli_kernel_explainer_max_runtime = 900\nmli_nlp_tokenizer = "tfidf"\nmli_nlp_top_n = 20\nmli_nlp_sample_limit = 10000\nmli_nlp_min_df = 3\nmli_nlp_max_df = 0.9\nmli_nlp_min_ngram = 1\nmli_nlp_max_ngram = 1\nmli_nlp_min_token_mode = "top"\nmli_nlp_tokenizer_max_features = -1\nmli_nlp_loco_max_features = -1\nmli_nlp_surrogate_tokenizer = "Linear Model + TF-IDF"\nmli_nlp_surrogate_tokens = 100\nmli_nlp_use_stop_words = true\nmli_nlp_stop_words = "english"\nmli_nlp_append_to_english_stop_words = false\nmli_image_enable = true\nmli_batch_size = 10000\nmli_dt_nthreads = 4\ndask_scheduler_options = ""\ndask_cuda_scheduler_options = ""\ndask_worker_options = "--memory-limit 0.95"\ndask_cuda_worker_options = "--memory-limit 0.95"\ndask_protocol = "tcp"\ndask_server_port = 8786\ndask_dashboard_port = 8787\ndask_cuda_protocol = "tcp"\ndask_cuda_server_port = 8790\ndask_cuda_dashboard_port = 8791\ndask_worker_nprocs = 1\ndask_worker_nthreads = 1\ndask_cuda_worker_nthreads = -2\nlightgbm_listen_port = 12400\nallow_non_dai_dask_cluster_for_testing = false\nlocal_minio_port = 9001\nworker_local_processors = 32\nworker_priority_queues_processors = 4\nworker_priority_queues_time_check = 30\nworker_remote_processors_max_threads_reduction_factor = 0.7\nredis_result_queue_polling_interval = 100\nmain_server_redis_ping_timeout = 5\nmain_server_minio_bucket_ping_timeout = 30\nworker_start_timeout = 30\nworker_no_main_server_wait_time = 1800\nworker_no_main_server_wait_time_with_hard_assert = 30\nworker_healthy_response_period = 300\nexpose_server_version = true\nenable_https = false\nssl_key_file = "/etc/dai/private_key.pem"\nssl_crt_file = "/etc/dai/cert.pem"\nssl_no_sslv2 = true\nssl_no_sslv3 = true\nssl_no_tlsv1 = true\nssl_no_tlsv1_1 = true\nssl_no_tlsv1_2 = false\nssl_no_tlsv1_3 = false\nssl_client_verify_mode = "CERT_NONE"\nssl_ca_file = ""\nssl_client_key_file = ""\nssl_client_crt_file = ""\nenable_xsrf_protection = true\nenable_secure_cookies = false\nverify_session_ip = false\nallow_concurrent_sessions = true\nenable_imputation = false\nverify_ipc_requests = true\nenable_advanced_features_experiment = false\nh2o_storage_address = ""\nh2o_storage_projects_enabled = false\nh2o_storage_tls_enabled = true\nh2o_storage_tls_ca_path = ""\nh2o_storage_tls_cert_path = ""\nh2o_storage_tls_key_path = ""\nh2o_storage_internal_default_project_id = ""\nh2o_storage_rpc_deadline_seconds = 60\nh2o_storage_rpc_bytestream_deadline_seconds = 7200\nh2o_storage_oauth2_scopes = ""\nh2o_storage_message_size_limit = 1048576000\nlog_level = 1\ncollect_server_logs_in_experiment_logs = false\nper_user_directories = true\ndata_import_ignore_file_names = [ "_SUCCESS",]\ndata_import_explode_list_type_columns_in_parquet = false\nfiles_without_extensions_expected_types = [ "parquet", "orc",]\ndatatable_separator = ""\nping_load_data_file = false\nping_sleep_period = 0.1\ndata_precision = "float32"\nhonor_datatable_precision = false\ntransformer_precision = "float32"\nulimit_up_to_hard_limit = true\ndisable_core_files = false\nlimit_nofile = 65535\nlimit_nproc = 16384\ncompute_correlation = false\nproduce_correlation_heatmap = false\nhigh_correlation_value_to_report = 0.95\nshow_developer_settings = false\nenable_benchmark_each_experiment = false\ndatatable_verbose_log = false\ndatatable_bom_csv = false\ndatatable_show_progress = false\ndatatable_allow_interruption = false\ndatatable_fread_anonymize = true\ndatatable_debug = false\ndatatable_strategy = "auto"\ndebug_print = false\ndebug_print_level = 0\nshow_killed = false\ndebug_print_server = false\nping_period_debug = 20\nhard_asserts = true\nhard_backtest_asserts = false\nstrict_compare_to_ga_trees = false\nreturn_quickly_autodl_testing = false\nreturn_before_final_model = false\ncheck_invalid_config_toml_keys = true\nenable_funnel = true\nclean_funnel = true\nquiet_funnel = false\nbroken_pool_sleeptime = 3\nallow_terminate = true\nallow_kill = true\nallow_reap_by_name_terminate = true\nallow_reap_by_name_kill = true\nmax_files_stack_search_when_broken = 5\nmax_lines_stack_search_when_broken = 10\nmax_length_when_broken = 256\nmax_length_log_anon = 10000\nmax_length_exception = 10000\nlightgbm_import_stall_trials = 3\ntensorflow_import_stall_trials = 3\nlightgbm_import_stall_timeout = 30\ntensorflow_import_stall_timeout = 30\ntask_trials = 2\npredict_safe_trials = 2\nfit_safe_trials = 2\nfinal_pipeline_trials = 2\nreap_trials = 3\nstats_trials = 5\npool_mark = "_"\nacquire_mark = "+"\ndeacquire_mark = "-"\ntask_sync_mark = "*"\nstats_mark = ":"\nping_mark = "_ping_"\npool_wait_mark = "|"\npool_submit_mark = "="\npool_busy_mark = "#"\ntask_sync_key_length = 6\nxgb_in_subprocess = true\nenable_preview_time_estimate = true\nenable_preview_mojo_size_estimate = true\nenable_preview_time_estimate_rough = false\nserver_proctitle = "DriverlessAI"\nserver_fork_proctitle = "DAI-fork"\nworker_proctitle = "DAI-worker"\nsystem_control_wait_time = 60\nallow_no_pid_host = true\nping_period_factor_ppc = 5\nping_check_period_memory = 5\nping_check_period_usage = 5\nping_check_period_files = 60\nping_check_server_health = true\nping_check_period_server_health = 1.0\nping_check_period_server_exists = 10.0\nping_check_period_server_health_hard_assert_limit = 60.0\nping_check_server_health_debug_duration_always_print = false\nping_check_period_server_health_info_logger = true\nserver_health_sigusr1 = false\nsystem_stats_procname = "ping_system_stats"\nserver_health_procname = "ping_server_health"\nserver_runtime_metrics = false\nping_check_period = 5\ngolden_fatal = false\nping_run_type = "fork"\nlock_logs = true\nlogger_parsable = true\nlock_logs_server = false\ndebug_feature_cache_files = false\ndebug_final_feature_cache_files = false\ngenerate_fresh_logger_every_log = true\nclose_main_logger_during_fork = false\nshutdown_logger_every_log = false\nget_system_stats_during_init = false\nlog_system_update_period = 1\nfinal_munging_memory_reduction_factor = 2\nmunging_memory_overhead_factor = 5\nmunging_memory_available = 0\nmunging_max_workers_verbose = false\nprob_segfault_ga_pipeline = 0.0\nprob_segfault_ga_model = 0.0\nprob_segfault_final_pipeline = 0.0\nprob_segfault_final_model = 0.0\nprob_segfault_fit_transform_ga = 0.0\nprob_segfault_transform_ga = 0.0\ndisallow_segfault_final = false\nprob_segfault_fit_transform_final = 0.0\nprob_segfault_transform_final = 0.0\nprob_segfault_lightgbm = 0.0\nprob_segfault_xgboost = 0.0\nprob_segfault_gblinear = 0.0\nprob_segfault_tensorflow = 0.0\nprob_segfault_rulefit = 0.0\ninject_nan_inf = false\nper_transformer_segfault_protection_ga = false\nper_transformer_segfault_protection_final = false\nprob_stall_ga_pipeline = 0.0\nprob_stall_fitmodel = 0.0\nprob_stall_final_pipeline = 0.0\nprob_stall_kill_ga_pipeline = 0.0\nprob_stall_kill_fitmodel = 0.0\nprob_stall_kill_final_pipeline = 0.0\nprob_finish_early = 0.0\nprob_abort_early = 0.0\nsubmit_resource_wait_period = 10\nstall_subprocess_submission_cpu_threshold_pct = 100\nstall_subprocess_submission_dai_fork_threshold_pct = -1.0\nstall_subprocess_submission_experiment_fork_threshold_pct = -1.0\nuse_active_forks = true\ndo_check_if_pool_reaped = false\ndo_check_if_reaped = false\nstall_by_memory = true\nstall_by_cores = true\nstall_by_disk = true\nrestrict_initpool_by_memory = true\nrestrict_initpool_by_cores = false\nbase_fork_count = 2\nexperiment_nice_level = 100\ndill_for_model = false\ngzip_for_model = true\nbzip_for_model = false\ncompression_level_for_model = 5\ncheck_nvidia_smi_during_experiment = false\nsmall_pool_sleeptouse = 0.005\nlarge_pool_sleeptouse = 0.01\nonetask_sleeptouse = 0.01\nonetask_fast_sleeptouse = 0.0001\npool_default_sleeptouse = 0.001\ntrace_detect_types = false\ntrace_fit_transform = false\ntrace_final_fit_transform = false\ntrace_final_transform = false\nterminate_experiment_if_server_lost = false\nuse_uuids = true\ntest_username = "h2oai"\ntest_username2 = "test-user"\ntesting_experiment_id = "testing_experiment_id"\nstartup_username = "startup_job_user"\nsys_username = "sys"\nscoring_username = "scoring_user"\nscoring_data_directory = "tmp"\nscoring_id = "scoring_id"\nacceptance_id = "acceptance_id"\ndata_recipe_id = "data_recipe_id"\npreview_id = "preview_id"\ndata_id = "data_id"\nreproducible = true\ncuda_visible_devices = "None"\nfast_import = false\nprotect_base_env = false\ncheck_cuda_context_testing = false\ntest_scoring_simple = true\nnum_models_for_resume_graph = 1000\nlast_exclusive_mode = "safe"\nmojo_acceptance_test_errors_fatal = true\nmojo_acceptance_test_errors_shap_fatal = true\nmojo_acceptance_test_orig_shap = true\nmake_mojo_scoring_pipeline_for_features_only = false\nmojo_replace_target_encoding_with_grouped_input_cols = false\nenable_single_instance_db_access = true\ntime_series_recipe = true\ntime_series_causal_split_recipe = false\nuse_lags_if_causal_recipe = false\ntime_series_leaderboard_mode = "diverse"\ntime_series_leaderboard_periods_per_model = 1\ntime_series_merge_splits = true\nmerge_splits_max_valid_ratio = -1.0\nfixed_size_train_timespan = false\ntime_series_validation_fold_split_datetime_boundaries = ""\ntime_series_validation_splits = -1\ntime_series_splits_max_overlap = 0.5\nmin_ymd_timestamp = 19000101\nmax_ymd_timestamp = 21000101\nmax_rows_datetime_format_detection = 100000\ndisallowed_datetime_formats = [ "%y",]\nuse_datetime_cache = true\ndatetime_cache_min_rows = 10000\nholiday_features = true\nholiday_country = ""\nholiday_countries = [ "UnitedStates", "UnitedKingdom", "EuropeanCentralBank", "Germany", "Mexico", "Japan",]\nmax_time_series_properties_sample_size = 250000\nmax_lag_sizes = 30\nmin_lag_autocorrelation = 0.1\nmax_signal_lag_sizes = 100\nsample_lag_sizes = false\nmax_sampled_lag_sizes = -1\noverride_lag_sizes = []\noverride_ufapt_lag_sizes = []\noverride_non_ufapt_lag_sizes = []\nmin_lag_size = -1\nallow_time_column_as_feature = true\nallow_time_column_as_numeric_feature = false\ndatetime_funcs = [ "year", "quarter", "month", "week", "weekday", "day", "dayofyear", "hour", "minute", "second",]\nfilter_datetime_funcs = true\nallow_tgc_as_features = false\nallowed_coltypes_for_tgc_as_features = [ "numeric", "categorical", "ohe_categorical", "datetime", "date", "text",]\nenable_time_unaware_transformers = "auto"\ntgc_only_use_all_groups = true\ntgc_allow_target_encoding = false\ntime_series_holdout_preds = true\ntime_series_max_holdout_splits = -1\nsingle_model_vs_cv_score_reldiff = 0.05\nsingle_model_vs_cv_score_reldiff2 = 0.0\nblend_in_link_space = true\nmli_ts_fast_approx = false\nmli_ts_fast_approx_contribs = true\nmli_ts_holdout_contribs = true\ntime_series_min_interpretability = 5\nlags_dropout = "dependent"\nprob_lag_non_targets = 0.1\nrolling_test_method = "tta"\nfast_tta_internal = true\nfast_tta_test = true\nprob_default_lags = 0.2\nprob_lagsinteraction = 0.2\nprob_lagsaggregates = 0.2\nts_target_trafo = "none"\nts_target_trafo_epidemic_target = "I"\nts_lag_target_trafo = "none"\nts_target_trafo_lag_size = -1\ntgc_via_ui_max_ncols = 10\ntgc_dup_tolerance = 0.01\ntimeseries_split_suggestion_timeout = 30.0\nuser_config_directory = ""\nprocsy_ip = "127.0.0.1"\nprocsy_port = 12347\nh2o_ip = "127.0.0.1"\nh2o_port = 12348\nport_range = []\nstrict_version_check = true\nmax_file_upload_size = 104857600000\ndata_upload_directory = "uploads"\ndata_connectors_logs_directory = "./tmp"\nmapr_tickets_directory = "./tmp/mapr-tickets"\nremove_temp_files_server_start = false\nremove_temp_files_aborted_experiments = true\nusage_stats_opt_in = true\ncore_site_xml_path = ""\nhdfs_config_path = ""\nkey_tab_path = ""\nhdfs_keytab_path = ""\npreview_cache_upon_server_exit = true\nall_tasks_visible_to_users = true\nenable_health_api = true\nnotification_url = "https://s3.amazonaws.com/ai.h2o.notifications/dai_notifications_prod.json"\nbase_url = "/"\nlisteners_experiment_start = ""\nlisteners_experiment_done = ""\nlisteners_mojo_done = ""\nlisteners_autodoc_done = ""\nlisteners_scoring_pipeline_done = ""\nlisteners_experiment_artifacts_done = ""\nenable_quick_benchmark = true\nenable_extended_benchmark = false\nextended_benchmark_scale_num_rows = 0.1\nextended_benchmark_num_cols = 20\nbenchmark_memory_timeout = 2\nbenchmark_memory_vm_fraction = 0.25\nbenchmark_memory_rows = 1183747\nbenchmark_memory_max_cols = 1500\nenable_startup_checks = true\napplication_id = "dai_30602"\ndebug = true\nserver_db_checkpoint_period = 10\nserver_wal_mode = true\nserver_wal_db_checkpoint_period = 360\nserver_timeout_flush_rpc = 10.0\nserver_timeout_flush_rpc_min = 10.0\nserver_timeout_flush_rpc_max = 120.0\nenable_gpu_usage_in_gui = true\ngpu_info_timeout = 5\nenable_google_analytics_tracker = true\nenable_google_analytics_exception_tracker = true\nenable_google_analytics_autodl_messages_tracker = true\ngoogle_analytics_product_id = "UA-102179536-6"\naudit_log_retention_period = 5\nassert_slow_procedures = false\n\n[monotonicity_constraints_dict]\n\n[optuna_pruner_kwargs]\nn_startup_trials = 5\nn_warmup_steps = 20\ninterval_steps = 20\npercentile = 25.0\nmin_resource = "auto"\nreduction_factor = 4\nmin_early_stopping_rate = 0\nn_brackets = 4\nmin_early_stopping_rate_low = 0\nupper = 1.0\nlower = 0.0\n\n[optuna_sampler_kwargs]\n\n[params_lightgbm]\n\n[params_xgboost]\n\n[params_dart]\n\n[params_tensorflow]\n\n[params_gblinear]\n\n[params_decision_tree]\n\n[params_rulefit]\n\n[params_ftrl]\n\n[params_grownet]\n\n[params_tune_lightgbm]\n\n[params_tune_xgboost]\n\n[params_tune_dart]\n\n[params_tune_tensorflow]\n\n[params_tune_gblinear]\n\n[params_tune_rulefit]\n\n[params_tune_ftrl]\n\n[params_tune_grownet]\n\n[autoviz_recommended_transformation]\n\n[h2o_recipes_kwargs]\n\n[recipe_dict]\n\n[recipe_activation]\ntransformers = []\nmodels = []\nscorers = []\ndata = []\n\n[dask_cuda_cluster_kwargs]\nscheduler_port = 0\ndashboard_address = ":0"\nprotocol = "tcp"\n\n[dask_cluster_kwargs]\nn_workers = 1\nprocesses = true\nthreads_per_worker = 1\nscheduler_port = 0\ndashboard_address = ":0"\nprotocol = "tcp"\n\n[dask_scheduler_env]\n\n[dask_cuda_scheduler_env]\n\n[dask_worker_env]\nNCCL_P2P_DISABLE = "1"\nNCCL_DEBUG = "WARN"\n\n[dask_cuda_worker_env]\n\n[extra_http_headers]\n\n[ts_target_trafo_epidemic_params_dict]\n
#> parameters.custom_features parameters.is_image
#> 1 NULL NA
#> log_file_path
#> 1 h2oai/h2oai_experiment_eb7c76ea-3356-11ec-9574-3417ebc6f56b/h2oai_experiment_logs_eb7c76ea-3356-11ec-9574-3417ebc6f56b.zip
#> pickle_path
#> 1 h2oai/h2oai_experiment_eb7c76ea-3356-11ec-9574-3417ebc6f56b/best_individual.pickle
#> summary_path
#> 1 h2oai/h2oai_experiment_eb7c76ea-3356-11ec-9574-3417ebc6f56b/h2oai_experiment_summary_eb7c76ea-3356-11ec-9574-3417ebc6f56b.zip
#> train_predictions_path
#> 1 h2oai/h2oai_experiment_eb7c76ea-3356-11ec-9574-3417ebc6f56b/train_preds.csv
#> valid_predictions_path
#> 1
#> test_predictions_path
#> 1 h2oai/h2oai_experiment_eb7c76ea-3356-11ec-9574-3417ebc6f56b/test_preds.csv
#> progress status training_duration score_f_name score test_score
#> 1 1 0 143.0201 AUC 0.7681882 0.7802183
#> deprecated model_file_size diagnostic_keys remote scoring_pipeline_size
#> 1 FALSE 1120586823 NULL FALSE 995608930
#> mojo_pipeline_size created
#> 1 103837359 1634920895
Similarly to dai.get_frame
, you can obtain an instance of DAIModel by dai.get_model
:
dai.get_model(models$key[1])
#> Experiment: lodehesa (eb7c76ea-3356-11ec-9574-3417ebc6f56b)
#> Version: 1.10.1+local_dev-placeholder-2113-g80efeacb3336-dirty, 2021-10-22 12:43
#> Settings: 1/1/10, seed=25, GPUs disabled
#> Train data: train (19199, 25)
#> Validation data: N/A
#> Test data: [Test] (4800, 24)
#> Target column: DEFAULT_PAYMENT_NEXT_MONTH (binary, 22.366% target class)
#> System specs: Linux, 31 GB, 8 CPU cores, 0/0 GPU
#> Max memory usage: 0.501 GB, 0 GB GPU
#> Recipe: AutoDL (5 iterations, 2 individuals)
#> Validation scheme: stratified, 1 internal holdout
#> Feature engineering: 23 features scored (21 selected)
#> Timing: MOJO latency 0.0250 millis (48.5kB), Python latency 64.4146 millis (40.0kB)
#> Data preparation: 13.92 secs
#> Shift/Leakage detection: 2.66 secs
#> Model and feature tuning: 25.75 secs (6 models trained)
#> Feature evolution: 1.77 secs (0 of 4 model trained)
#> Final pipeline training: 41.74 secs (7 models trained)
#> Python / MOJO scorer building: 31.33 secs / 14.32 secs
#> Validation score: AUC = 0.5 (constant preds of -1.244)
#> Validation score: AUC = 0.7739478 +/- 0.007726961 (baseline)
#> Validation score: AUC = 0.7681882 +/- 0.004076626 (final pipeline)
#> Test score: AUC = 0.7802183 +/- 0.008590495 (final pipeline)
Finally, the datasets and models can be removed by dai.rm
:
dai.rm(model, creditcard, splits$train, splits$test, iris_dai)
#> Model eb7c76ea-3356-11ec-9574-3417ebc6f56b removed
#> Dataset e5a3901e-3356-11ec-9574-3417ebc6f56b removed
#> Dataset e91e984c-3356-11ec-9574-3417ebc6f56b removed
#> Dataset e91ef882-3356-11ec-9574-3417ebc6f56b removed
#> Dataset e78c6b94-3356-11ec-9574-3417ebc6f56b removed
The function dai.rm
deletes the objects by default both from the server and the R session. If you wish to remove it only from the server, you can set from_session=FALSE
. Please note that only objects can be removed from the session, i.e. in the example above the splits$train
and splits$test
objects will not be removed from R session, because they are actually function calls (recall that $
is a function).