System configuration
exclusive_mode
Exclusive level of access to node resources (String) (Expert Setting)
Default value 'safe'
safe: assume might be running another experiment on same node moderate: assume not running any other experiments or tasks on same node, but still only use physical core count max: assume not running anything else on node at all except the experiment If multinode is enabled, this option has no effect, unless worker_remote_processors=1 when it will still be applied. Each exclusive mode can be chosen, and then fine-tuned using each expert settings. Changing the exclusive mode will reset all exclusive mode related options back to default and then re-apply the specific rules for the new mode, which will undo any fine-tuning of expert options that are part of exclusive mode rules. If choose to do new/continued/refitted/retrained experiment from parent experiment, all the mode rules are not re-applied and any fine-tuning is preserved. To reset mode behavior, one can switch between ‘safe’ and the desired mode. This way the new child experiment will use the default system resources for the chosen mode.
max_cores
Number of cores to use (0 = all) (Number) (Expert Setting)
Default value 0
Max number of CPU cores to use for the whole system. Set to <= 0 to use all (physical) cores.
If the number of worker_remote_processors
is set to a value >= 3, the number of cores will be reduced
by the ratio (worker_remote_processors_max_threads_reduction_factor
* worker_remote_processors
)
to avoid overloading the system when too many remote tasks are processed at once.
One can also set environment variable ‘OMP_NUM_THREADS’ to number of cores to use for OpenMP
(e.g., in bash: ‘export OMP_NUM_THREADS=32’ and ‘export OPENBLAS_NUM_THREADS=32’).
small_data_recipe_work
Small data work (String) (Expert Setting)
Default value 'auto'
Whether to treat data as small recipe in terms of work, by spreading many small tasks across many cores instead of forcing GPUs, for models that support it via static var _use_single_core_if_many. ‘auto’ looks at _use_single_core_if_many for models and data size, ‘on’ forces, ‘off’ disables.
max_fit_cores
Maximum number of cores to use for model fit (Number) (Expert Setting)
Default value 10
Control maximum number of cores to use for a model’s fit call (0 = all physical cores >= 1 that count). See also tensorflow_model_max_cores to further limit TensorFlow main models.
parallel_score_max_workers
Maximum number of cores to use for model parallel scoring (Number) (Expert Setting)
Default value 0
Control maximum number of cores to use for a scoring across all chosen scorers (0 = auto)
use_dask_cluster
If full dask cluster is enabled, use full cluster (Boolean) (Expert Setting)
Default value True
Whether to use full multinode distributed cluster (True) or single-node dask (False). In some cases, using entire cluster can be inefficient. E.g. several DGX nodes can be more efficient if used one DGX at a time for medium-sized data.
max_predict_cores
Maximum number of cores to use for model predict (Number) (Expert Setting)
Default value 0
Control maximum number of cores to use for a model’s predict call (0 = all physical cores >= 1 that count)
max_predict_cores_in_dai
Maximum number of cores to use for model transform and predict when doing MLI and AutoDoc. (Number) (Expert Setting)
Default value -1
- Control maximum number of cores to use for a model’s transform and predict call when doing operations inside DAI-MLI GUI and R/Py client.
The main experiment and other tasks like MLI and autoreport have separate queues. The main experiments have run at most worker_remote_processors tasks (limited by cores if auto mode), while other tasks run at most worker_local_processors (limited by cores if auto mode) tasks at the same time, so many small tasks can add up. To prevent overloading the system, the defaults are conservative. However, if most of the activity involves autoreport or MLI, and no model experiments are running, it may be safe to increase this value to something larger than 4. -1 : Auto mode. Up to physical cores divided by 4, up to maximum of 10.
0 : all physical cores >= 1: that count).
batch_cpu_tuning_max_workers
Tuning workers per batch for CPU (Number) (Expert Setting)
Default value 0
Control number of workers used in CPU mode for tuning (0 = socket count -1 = all physical cores >= 1 that count). More workers will be more parallel but models learn less from each other.
cpu_max_workers
Num. workers for CPU training (Number) (Expert Setting)
Default value 0
Control number of workers used in CPU mode for training (0 = socket count -1 = all physical cores >= 1 that count)
assumed_simultaneous_dt_forks_munging
Assumed/Expected number of munging forks (Number) (Expert Setting)
Default value 3
Expected maximum number of forks, used to ensure datatable doesn’t overload system. For actual use beyond this value, system will start to have slow-down issues
max_max_dt_threads_munging
Max. threads for datatable munging (Number) (Expert Setting)
Default value 4
Maximum of threads for datatable for munging
max_max_dt_threads_readwrite
Max. threads for datatable reading/writing (Number) (Expert Setting)
Default value 4
Maximum of threads for datatable for reading/writing files
max_workers_final_base_models
Max. workers for final model building (Number) (Expert Setting)
Default value 0
Maximum parallel workers for final model building. 0 means automatic, >=1 means limit to no more than that number of parallel jobs. Can be required if some transformer or model uses more than the expected amount of memory. Ways to reduce final model building memory usage, e.g. set one or more of these and retrain final model: 1) Increase munging_memory_overhead_factor to 10 2) Increase final_munging_memory_reduction_factor to 10 3) Lower max_workers_final_munging to 1 4) Lower max_workers_final_base_models to 1 5) Lower max_cores to, e.g., 1/2 or 1/4 of physical cores.
max_workers_final_munging
Max. workers for final per-model munging (Number) (Expert Setting)
Default value 0
Maximum parallel workers for final per-model munging. 0 means automatic, >=1 means limit to no more than that number of parallel jobs. Can be required if some transformer uses more than the expected amount of memory.
min_dt_threads_munging
min_dt_threads_munging (Number) (Expert Setting)
Default value 1
Minimum number of threads for datatable (and OpenMP) during data munging (per process). datatable is the main data munging tool used within Driverless ai (source : https://github.com/h2oai/datatable)
min_dt_threads_final_munging
min_dt_threads_final_munging (Number) (Expert Setting)
Default value 1
Like min_datatable (and OpenMP)_threads_munging but for final pipeline munging
max_dt_threads_munging
Max. Num. of threads to use for datatable and openblas for munging and model training (0 = all, -1 = auto) (Number) (Expert Setting)
Default value -1
Maximum number of threads for datatable during data munging (per process) (0 = all, -1 = auto). If multiple forks, threads are distributed across forks.
max_dt_threads_readwrite
Max. Num. of threads to use for datatable read and write of files (0 = all, -1 = auto) (Number) (Expert Setting)
Default value -1
Maximum number of threads for datatable during data reading and writing (per process) (0 = all, -1 = auto). If multiple forks, threads are distributed across forks.
max_dt_threads_stats_openblas
Max. Num. of threads to use for datatable stats and openblas (0 = all, -1 = auto) (Number) (Expert Setting)
Default value -1
Maximum number of threads for datatable stats and openblas (per process) (0 = all, -1 = auto). If multiple forks, threads are distributed across forks.
num_gpus_per_experiment
#GPUs/Experiment (-1 = autodetect or all) (Number) (Expert Setting)
Default value -1
Number of GPUs to use per experiment for training task. Set to -1 for all GPUs. An experiment will generate many different models. Currently num_gpus_per_experiment!=-1 disables GPU locking, so is only recommended for single experiments and single users. Ignored if GPUs disabled or no GPUs on system. More info at: https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker#gpu-isolation In multinode context when using dask, this refers to the per-node value. For ImageAutoModel, this refers to the total number of GPUs used for that entire model type, since there is only one model type for the entire experiment. E.g. if have 4 GPUs and want 2 ImageAuto experiments to run on 2 GPUs each, can set num_gpus_per_experiment to 2 for each experiment, and each of the 4 GPUs will be used one at a time by the 2 experiments each using 2 GPUs only.
min_num_cores_per_gpu
Num Cores/GPU (Number) (Expert Setting)
Default value -2
- Number of CPU cores per GPU. Limits number of GPUs in order to have sufficient cores per GPU.
Set to -1 to disable, -2 for auto mode. In auto mode, if lightgbm_use_gpu is ‘auto’ or ‘off’, then min_num_cores_per_gpu=1, else min_num_cores_per_gpu=2, due to lightgbm requiring more cores even when using GPUs.
num_gpus_per_model
#GPUs/Model (-1 = all) (Number) (Expert Setting)
Default value 1
Number of GPUs to use per model training task. Set to -1 for all GPUs. For example, when this is set to -1 and there are 4 GPUs available, all of them can be used for the training of a single model. Only applicable currently to image auto pipeline building recipe or Dask models with more than one GPU or more than one node. Ignored if GPUs disabled or no GPUs on system. For ImageAutoModel, the maximum of num_gpus_per_model and num_gpus_per_experiment (all GPUs if -1) is taken. More info at: https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker#gpu-isolation In multinode context when using Dask, this refers to the per-node value.
num_gpus_for_prediction
Num. of GPUs for isolated prediction/transform (Number) (Expert Setting)
Default value 0
Number of GPUs to use for predict for models and transform for transformers when running outside of fit/fit_transform. -1 means all, 0 means no GPUs, >1 means that many GPUs up to visible limit. If predict/transform are called in same process as fit/fit_transform, number of GPUs will match, while new processes will use this count for number of GPUs for applicable models/transformers. Exception: TensorFlow, PyTorch models/transformers, and RAPIDS predict on GPU always if GPUs exist. RAPIDS requires python scoring package be used also on GPUs. In multinode context when using Dask, this refers to the per-node value.
gpu_id_start
GPU starting ID (0..visible #GPUs - 1) (Number) (Expert Setting)
Default value -1
Which gpu_id to start with -1 : auto-mode. E.g. 2 experiments can each set num_gpus_per_experiment to 2 and use 4 GPUs If using CUDA_VISIBLE_DEVICES=… to control GPUs (preferred method), gpu_id=0 is the first in that restricted list of devices. E.g. if CUDA_VISIBLE_DEVICES=’4,5’ then gpu_id_start=0 will refer to the device #4. E.g. from expert mode, to run 2 experiments, each on a distinct GPU out of 2 GPUs: Experiment#1: num_gpus_per_model=1, num_gpus_per_experiment=1, gpu_id_start=0 Experiment#2: num_gpus_per_model=1, num_gpus_per_experiment=1, gpu_id_start=1 E.g. from expert mode, to run 2 experiments, each on a distinct GPU out of 8 GPUs: Experiment#1: num_gpus_per_model=1, num_gpus_per_experiment=4, gpu_id_start=0 Experiment#2: num_gpus_per_model=1, num_gpus_per_experiment=4, gpu_id_start=4 E.g. Like just above, but now run on all 4 GPUs/model Experiment#1: num_gpus_per_model=4, num_gpus_per_experiment=4, gpu_id_start=0 Experiment#2: num_gpus_per_model=4, num_gpus_per_experiment=4, gpu_id_start=4 If num_gpus_per_model!=1, global GPU locking is disabled (because underlying algorithms don’t support arbitrary gpu ids, only sequential ids), so must setup above correctly to avoid overlap across all experiments by all users More info at: https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker#gpu-isolation Note that GPU selection does not wrap, so gpu_id_start + num_gpus_per_model must be less than number of visibile GPUs
allow_reduce_features_when_failure
Whether to reduce features when model fails (String) (Expert Setting)
Default value 'auto'
Whether to reduce features until model does not fail. Currently for non-dask XGBoost models (i.e. GLMModel, XGBoostGBMModel, XGBoostDartModel, XGBoostRFModel), during normal fit or when using Optuna. Primarily useful for GPU OOM. If XGBoost runs out of GPU memory, this is detected, and (regardless of setting of skip_model_failures), we perform feature selection using XGBoost on subsets of features. The dataset is progressively reduced by factor of 2 with more models to cover all features. This splitting continues until no failure occurs. Then all sub-models are used to estimate variable importance by absolute information gain, in order to decide which features to include. Finally, a single model with the most important features is built using the feature count that did not lead to OOM. For ‘auto’, this option is set to ‘off’ when reproducible experiment is enabled, because the condition of running OOM can change for same experiment seed. Reduction is only done on features and not on rows for the feature selection step.
reduce_repeats_when_failure
Number of repeats for models used for feature selection during failure recovery. (Number) (Expert Setting)
Default value 1
With allow_reduce_features_when_failure, this controls how many repeats of sub-models used for feature selection. A single repeat only has each sub-model consider a single sub-set of features, while repeats shuffle which features are considered allowing more chance to find important interactions. More repeats can lead to higher accuracy. The cost of this option is proportional to the repeat count.
fraction_anchor_reduce_features_when_failure
Fraction of features treated as anchor for feature selection during failure recovery. (Float) (Expert Setting)
Default value 0.1
With allow_reduce_features_when_failure, this controls the fraction of features treated as an anchor that are fixed for all sub-models. Each repeat gets new anchors. For tuning and evolution, the probability depends upon any prior importance (if present) from other individuals, while final model uses uniform probability for anchor features.
xgboost_reduce_on_errors_list
Errors from XGBoost that trigger reduction of features (List) (Expert Setting)
Default value ['Memory allocation error on worker', 'out of memory', 'XGBDefaultDeviceAllocatorImpl', 'invalid configuration argument', 'Requested memory']
Error strings from XGBoost that are used to trigger re-fit on reduced sub-models. See allow_reduce_features_when_failure.
lightgbm_reduce_on_errors_list
Errors from LightGBM that trigger reduction of features (List) (Expert Setting)
Default value ['Out of Host Memory']
Error strings from LightGBM that are used to trigger re-fit on reduced sub-models. See allow_reduce_features_when_failure.
lightgbm_use_gpu
Whether to use GPUs for LightGBM (String) (Expert Setting)
Default value 'auto'
- LightGBM does not significantly benefit from GPUs, unlike other tools like XGBoost or Bert/Image Models.
Each experiment will try to use all GPUs, and on systems with many cores and GPUs, this leads to many experiments running at once, all trying to lock the GPU for use, leaving the cores heavily under-utilized. So by default, DAI always uses CPU for LightGBM, unless ‘on’ is specified.
num_gpus_per_hyperopt_dask
#GPUs/HyperOptDask (-1 = all) (Number) (Expert Setting)
Default value -1
Number of GPUs to use per model hyperopt training task. Set to -1 for all GPUs. For example, when this is set to -1 and there are 4 GPUs available, all of them can be used for the training of a single model across a Dask cluster. Ignored if GPUs disabled or no GPUs on system. In multinode context, this refers to the per-node value.
detailed_traces
Enable detailed traces (Boolean) (Expert Setting)
Default value False
Whether to enable detailed traces (in GUI Trace)
debug_log
Enable debug log level (Boolean) (Expert Setting)
Default value False
Whether to enable debug log level (in log files)
log_system_info_per_experiment
Enable logging of system information for each experiment (Boolean) (Expert Setting)
Default value True
Whether to add logging of system information such as CPU, GPU, disk space at the start of each experiment log. Same information is already logged in system logs.
default_experiments_quota_per_user
default_experiments_quota_per_user (Number)
Default value -1
Default the upper bound number of experiments owned per user. Negative value means infinite quota.
override_experiments_quota_for_users
override_experiments_quota_for_users (Dict)
Default value {}
Dictionary of key:list of experiments quota values for users, overrides above defaults with specified set of users
e.g: override_experiments_quota_for_users="{'user1':10,'user2':20,'user3':30}"
to set user1 with 10 experiments quota,
user2 with 20 experiments quota and user3 with 30 experiments quota.
enable_h2o_recipes
Enable h2o recipes server (Boolean) (Expert Setting)
Default value True
Whether to enable use of H2O recipe server. In some casees, recipe server (started at DAI startup) may enter into an unstable state, and this might affect other experiments. Then one can avoid triggering use of the recipe server by setting this to false.
enable_projects
Enable Projects workspace (Boolean)
Default value True
Enable Projects workspace (alpha version, for evaluation)
enable_license_manager
enable_license_manager (Boolean)
Default value False
Switches Driverless AI to use H2O.ai License Management Server to manage licenses/permission to use software
license_manager_address
license_manager_address (String)
Default value 'http://127.0.0.1:9999'
Address at which to communicate with H2O.ai License Management Server. Requires above value, enable_license_manager set to True. Format: {http/https}://{ip address}:{port number}
license_manager_project_name
license_manager_project_name (String)
Default value 'default'
Name of license manager project that Driverless AI will attempt to retrieve leases from. NOTE: requires an active license within the License Manager Server to function properly
license_manager_lease_duration
license_manager_lease_duration (Number)
Default value 3600000
Number of milliseconds a lease for users will be expected to last, if using the H2O.ai License Manager server, before the lease REQUIRES renewal.
Default: 3600000 (1 hour) = 1 hour * 60 min / hour * 60 sec / min * 1000 milliseconds / sec
license_manager_worker_lease_duration
license_manager_worker_lease_duration (Number)
Default value 21600000
Number of milliseconds a lease for Driverless AI worker nodes will be expected to last, if using the H2O.ai License Manager server, before the lease REQUIRES renewal. Default: 21600000 (6 hour) = 6 hour * 60 min / hour * 60 sec / min * 1000 milliseconds / sec
license_manager_ssl_certs
license_manager_ssl_certs (String)
Default value 'true'
To be used only if License Manager server is started with HTTPS Accepts a boolean: true/false, or a path to a file/directory. Denotates whether or not to attempt SSL Certificate verification when making a request to the License Manager server. True: attempt ssl certificate verification, will fail if certificates are self signed False: skip ssl certificate verification. /path/to/cert/directory: load certificates <cert.pem> in directory and use those for certificate verification Behaves in the same manner as python requests package: https://requests.readthedocs.io/en/latest/user/advanced/#ssl-cert-verification
license_manager_worker_startup_timeout
license_manager_worker_startup_timeout (Number)
Default value 3600000
Amount of time that Driverless AI workers will keep retrying to startup and obtain a lease from the license manager before timing out. Time out will cause worker startup to fail.
license_manager_dry_run_token
license_manager_dry_run_token (String)
Default value ''
Emergency setting that will allow Driverless AI to run even if there is issues communicating with or obtaining leases from, the License Manager server.
This is an encoded string that can be obtained from either the license manager ui or the logs of the license manager server.
start_dask_worker
Start dask workers for given multinode worker (Boolean)
Default value True
Whether to start dask workers on this multinode worker.
dask_cuda_scheduler_env
Set dask cuda scheduler env. (Dict)
Default value {}
Set dask scheduler env. See https://docs.dask.org/en/latest/setup/cli.html
dask_scheduler_options
Set dask scheduler command-line options. (String)
Default value ''
Set dask scheduler options. See https://docs.dask.org/en/latest/setup/cli.html
dask_cuda_scheduler_options
Set dask cuda scheduler command-line options. (String)
Default value ''
Set dask cuda scheduler options. See https://docs.dask.org/en/latest/setup/cli.html
dask_worker_options
Set dask worker command-line options. (String)
Default value '--memory-limit 0.95'
Set dask worker options. See https://docs.dask.org/en/latest/setup/cli.html
dask_cuda_worker_options
Set dask cuda worker options. (String)
Default value '--memory-limit 0.95'
Set dask cuda worker options. Similar options as dask_cuda_cluster_kwargs. See https://dask-cuda.readthedocs.io/en/latest/ucx.html#launching-scheduler-workers-and-clients-separately “–rmm-pool-size 1GB” can be set to give 1GB to RMM for more efficient rapids
dask_protocol
Protocol using for dask communications. (String)
Default value 'tcp'
See https://docs.dask.org/en/latest/setup/cli.html e.g. ucx is optimal, while tcp is most reliable
dask_server_port
Port using by server for dask communications. (Number)
Default value 8786
dask_dashboard_port
Dask dashboard port for dask diagnostics. (Number)
Default value 8787
dask_cuda_protocol
Protocol using for dask cuda communications. (String)
Default value 'tcp'
See https://docs.dask.org/en/latest/setup/cli.html e.g. ucx is optimal, while tcp is most reliable
dask_cuda_server_port
Port using by server for dask cuda communications. (Number)
Default value 8790
See https://docs.dask.org/en/latest/setup/cli.html port + 1 is used for dask dashboard
dask_cuda_dashboard_port
Dask dashboard port for dask_cuda diagnostics. (Number)
Default value 8791
dask_server_ip
IP address using by server for dask and dask cuda communications. (String)
Default value ''
If empty string, auto-detect IP capable of reaching network. Required to be set if using worker_mode=multinode.
dask_worker_nprocs
Number of processes per dask worker. (Number)
Default value 1
Number of processses per dask (not cuda-GPU) worker. If -1, uses dask default of cpu count + 1 + nprocs. If -2, uses DAI default of total number of physical cores. Recommended for heavy feature engineering. If 1, assumes tasks are mostly multi-threaded and can use entire node per task. Recommended for heavy multinode model training. Only applicable to dask (not dask_cuda) workers
dask_worker_nthreads
Number of threads per process for dask. (Number)
Default value 1
Number of threads per process for dask workers
dask_cuda_worker_nthreads
Number of threads per process for dask_cuda. (Number)
Default value -2
Number of threads per process for dask_cuda workers If -2, uses DAI default of physical cores per GPU, since must have 1 worker/GPU only.
lightgbm_listen_port
LightGBM local listen port when using dask with lightgbm (Number)
Default value 12400
enable_jupyter_server
enable_jupyter_server (Boolean)
Default value False
Whether to enable jupyter server
jupyter_server_port
jupyter_server_port (Number)
Default value 8889
Port for jupyter server
enable_jupyter_server_browser
enable_jupyter_server_browser (Boolean)
Default value False
Whether to enable jupyter server browser
enable_jupyter_server_browser_root
enable_jupyter_server_browser_root (Boolean)
Default value False
Whether to root access to jupyter server browser
triton_log_level
triton_log_level (Number)
Default value 0
triton_model_reload_on_startup_count
triton_model_reload_on_startup_count (Number)
Default value 0
triton_clean_up_temp_python_env_on_startup
triton_clean_up_temp_python_env_on_startup (Boolean)
Default value True
multinode_enable_strict_queue_policy
multinode_enable_strict_queue_policy (Boolean)
Default value False
When set to true, CPU executors will strictly run just CPU tasks.
multinode_enable_cpu_tasks_on_gpu_machines
multinode_enable_cpu_tasks_on_gpu_machines (Boolean)
Default value True
Controls whether CPU tasks can run on GPU machines.
multinode_storage_medium
multinode_storage_medium (String)
Default value 'minio'
Storage medium to be used to exchange data between main server and remote worker nodes.
worker_mode
worker_mode (String)
Default value 'singlenode'
- How the long running tasks are scheduled.
multiprocessing: forks the current process immediately. singlenode: shares the task through redis and needs a worker running. multinode: same as singlenode and also shares the data through minio
and allows worker to run on the different machine.
redis_ip
redis_ip (String)
Default value '127.0.0.1'
Redis settings
redis_port
redis_port (Number)
Default value 6379
Redis settings
redis_db
redis_db (Number)
Default value 0
Redis database. Each DAI instance running on the redis server should have unique integer.
main_server_redis_password
main_server_redis_password (String)
Default value 'PlWUjvEJSiWu9j0aopOyL5KwqnrKtyWVoZHunqxr'
Redis password. Will be randomly generated main server startup, and by default it will show up in config file uncommented.If you are running more than one DriverlessAI instance per system, make sure each and every instance is connected to its own redis queue.
redis_encrypt_config
redis_encrypt_config (Boolean)
Default value False
If set to true, the config will get encrypted before it gets saved into the Redis database.
local_minio_port
local_minio_port (Number)
Default value 9001
The port that Minio will listen on, this only takes effect if the current system is a multinode main server.
main_server_minio_address
main_server_minio_address (String)
Default value '127.0.0.1:9001'
Location of main server’s minio server.
main_server_minio_access_key_id
main_server_minio_access_key_id (String)
Default value 'GMCSE2K2T3RV6YEHJUYW'
Access key of main server’s minio server.
main_server_minio_secret_access_key
main_server_minio_secret_access_key (String)
Default value 'JFxmXvE/W1AaqwgyPxAUFsJZRnDWUaeQciZJUe9H'
Secret access key of main server’s minio server.
main_server_minio_bucket
main_server_minio_bucket (String)
Default value 'h2oai'
Name of minio bucket used for file synchronization.
main_server_s3_access_key_id
main_server_s3_access_key_id (String)
Default value 'access_key'
S3 global access key.
main_server_s3_secret_access_key
main_server_s3_secret_access_key (String)
Default value 'secret_access_key'
S3 global secret access key
main_server_s3_bucket
main_server_s3_bucket (String)
Default value 'h2oai-multinode-tests'
S3 bucket.
worker_local_processors
worker_local_processors (Number)
Default value 32
Maximum number of local tasks processed at once, limited to no more than total number of physical (not virtual) cores divided by two (minimum of 1).
worker_priority_queues_processors
worker_priority_queues_processors (Number)
Default value 4
A concurrency limit for the 3 priority queues, only enabled when worker_remote_processors is greater than 0.
worker_priority_queues_time_check
worker_priority_queues_time_check (Number)
Default value 30
A timeout before which a scheduled task is bumped up in priority
worker_remote_processors
worker_remote_processors (Number)
Default value -1
Maximum number of remote tasks processed at once, if value is set to -1 the system will automatically pick a reasonable limit depending on the number of available virtual CPU cores.
worker_remote_processors_max_threads_reduction_factor
worker_remote_processors_max_threads_reduction_factor (Float)
Default value 0.7
If worker_remote_processors >= 3, factor by which each task reduces threads, used by various packages like datatable, lightgbm, xgboost, etc.
multinode_tmpfs
multinode_tmpfs (String)
Default value ''
Temporary file system location for multinode data transfer. This has to be an absolute path with equivalent configuration on both the main server and remote workers.
multinode_store_datasets_in_tmpfs
multinode_store_datasets_in_tmpfs (Boolean)
Default value False
When set to true, will use the ‘multinode_tmpfs’ as datasets store.
redis_result_queue_polling_interval
redis_result_queue_polling_interval (Number)
Default value 100
How often the server should extract results from redis queue in milliseconds.
worker_sleep
worker_sleep (Float)
Default value 0.1
Sleep time for worker loop.
main_server_minio_bucket_ping_timeout
main_server_minio_bucket_ping_timeout (Number)
Default value 180
For how many seconds worker should wait for main server minio bucket before it fails
worker_start_timeout
worker_start_timeout (Number)
Default value 30
How long the worker should wait on redis db initialization in seconds.
worker_no_main_server_wait_time
worker_no_main_server_wait_time (Number)
Default value 1800
worker_no_main_server_wait_time_with_hard_assert
worker_no_main_server_wait_time_with_hard_assert (Number)
Default value 30
worker_healthy_response_period
worker_healthy_response_period (Number)
Default value 300
For how many seconds the worker shouldn’t respond to be marked unhealthy.
enable_experiments_priority_queue
Enable using priority queue to schedule experiments (Boolean)
Default value False
Whether to enable priority queue for worker nodes to schedule experiments.
expose_server_version
expose_server_version (Boolean)
Default value True
Exposes the DriverlessAI base version when enabled.
enable_https
enable_https (Boolean)
Default value False
https settings
You can make a self-signed certificate for testing with the following commands:
sudo openssl req -x509 -newkey rsa:4096 -keyout private_key.pem -out cert.pem -days 3650 -nodes -subj ‘/O=Driverless AI’ sudo chown dai:dai cert.pem private_key.pem sudo chmod 600 cert.pem private_key.pem sudo mv cert.pem private_key.pem /etc/dai
ssl_key_file
ssl_key_file (String)
Default value '/etc/dai/private_key.pem'
https settings
You can make a self-signed certificate for testing with the following commands:
sudo openssl req -x509 -newkey rsa:4096 -keyout private_key.pem -out cert.pem -days 3650 -nodes -subj ‘/O=Driverless AI’ sudo chown dai:dai cert.pem private_key.pem sudo chmod 600 cert.pem private_key.pem sudo mv cert.pem private_key.pem /etc/dai
ssl_crt_file
ssl_crt_file (String)
Default value '/etc/dai/cert.pem'
https settings
You can make a self-signed certificate for testing with the following commands:
sudo openssl req -x509 -newkey rsa:4096 -keyout private_key.pem -out cert.pem -days 3650 -nodes -subj ‘/O=Driverless AI’ sudo chown dai:dai cert.pem private_key.pem sudo chmod 600 cert.pem private_key.pem sudo mv cert.pem private_key.pem /etc/dai
ssl_key_passphrase
ssl_key_passphrase (String)
Default value ''
https settings
Passphrase for the ssl_key_file, either use this setting or ssl_key_passphrase_file, or neither if no passphrase is used.
ssl_key_passphrase_file
ssl_key_passphrase_file (String)
Default value ''
https settings
Passphrase file for the ssl_key_file, either use this setting or ssl_key_passphrase, or neither if no passphrase is used.
ssl_no_sslv2
ssl_no_sslv2 (Boolean)
Default value True
SSL TLS
ssl_no_sslv3
ssl_no_sslv3 (Boolean)
Default value True
SSL TLS
ssl_no_tlsv1
ssl_no_tlsv1 (Boolean)
Default value True
SSL TLS
ssl_no_tlsv1_1
ssl_no_tlsv1_1 (Boolean)
Default value True
SSL TLS
ssl_no_tlsv1_2
ssl_no_tlsv1_2 (Boolean)
Default value False
SSL TLS
ssl_no_tlsv1_3
ssl_no_tlsv1_3 (Boolean)
Default value False
SSL TLS
ssl_client_verify_mode
ssl_client_verify_mode (String)
Default value 'CERT_NONE'
https settings
Sets the client verification mode.
- CERT_NONE: Client does not need to provide the certificate and if it does any
verification errors are ignored.
- CERT_OPTIONAL: Client does not need to provide the certificate and if it does
certificate is verified against set up CA chains.
- CERT_REQUIRED: Client needs to provide a certificate and certificate is
verified. You’ll need to set ‘ssl_client_key_file’ and ‘ssl_client_crt_file’ When this mode is selected for Driverless to be able to verify it’s own callback requests.
ssl_ca_file
ssl_ca_file (String)
Default value ''
https settings
Path to the Certification Authority certificate file. This certificate will be used when to verify client certificate when client authentication is turned on.
If this is not set, clients are verified using default system certificates.
ssl_client_key_file
ssl_client_key_file (String)
Default value ''
https settings
path to the private key that Driverless will use to authenticate itself when CERT_REQUIRED mode is set.
ssl_client_crt_file
ssl_client_crt_file (String)
Default value ''
https settings
path to the client certificate that Driverless will use to authenticate itself when CERT_REQUIRED mode is set.
enable_xsrf_protection
Enable XSRF Webserver protection (Boolean)
Default value True
If enabled, webserver will serve xsrf cookies and verify their validity upon every POST request
xsrf_cookie_samesite
SameSite Attribute for XSRF Cookie (String)
Default value ''
Sets the SameSite attribute for the _xsrf cookie; options are “Lax”, “Strict”, or “”.
enable_secure_cookies
Enable secure flag on HTTP cookies (Boolean)
Default value False
verify_session_ip
When enabled, webserver verifies session and request IP address (Boolean)
Default value False
When enabled each authenticated access will be verified comparing IP address of initiator of session and current request IP
custom_recipe_security_analysis_enabled
custom_recipe_security_analysis_enabled (Boolean)
Default value False
Enables automatic detection for forbidden/dangerous constructs in custom recipe
custom_recipe_import_allowlist
custom_recipe_import_allowlist (List)
Default value []
List of modules that can be imported in custom recipes. Default empty list means all modules are allowed except for banlisted ones
custom_recipe_import_banlist
custom_recipe_import_banlist (List)
Default value ['shlex', 'plumbum', 'pexpect', 'envoy', 'commands', 'fabric', 'subprocess', 'os.system', 'system']
List of modules that cannot be imported in custom recipes
custom_recipe_method_call_allowlist
custom_recipe_method_call_allowlist (List)
Default value []
- Regex pattern list of calls which are allowed in custom recipes.
Empty list means everything (except for banlist) is allowed. E.g. if only os.path.* is in allowlist, custom recipe can only call methods from os.path module and the built in ones
custom_recipe_method_call_banlist
custom_recipe_method_call_banlist (List)
Default value ['os\\.system', 'socket\\..*', 'subprocess.*', 'os.spawn.*']
- Regex pattern list of calls which need to be rejected in custom recipes.
E.g. if os.system in banlist, custom recipe cannot call os.system(). If socket.* in banlist, recipe cannot call any method of socket module such as socket.socket() or any socket.a.b.c()
custom_recipe_dangerous_patterns
custom_recipe_dangerous_patterns (List)
Default value ['rm -rf', 'rm -fr']
- List of regex patterns representing dangerous sequences/constructs
which could be harmful to whole system and should be banned from code
allow_concurrent_sessions
Enable concurrent session for same user (Boolean)
Default value True
If enabled, user can log in from 2 browsers (scripts) at the same time
extra_http_headers
extra_http_headers (Dict)
Default value {}
Extra HTTP headers.
http_cookie_attributes
Extra HTTP cookie flags (Dict)
Default value {'samesite': 'Lax'}
By default DriverlessAI issues cookies with HTTPOnly and Secure attributes (morsels) enabled. In addition to that, SameSite attribute is set to ‘Lax’, as it’s a default in modern browsers. The config overrides the default key/value (morsels).
h2o_storage_mode
h2o_storage_mode (String)
Default value 'h2o-storage'
- Specifies whether DriverlessAI uses H2O Storage or H2O Entity Server for
a shared entities backend.
h2o-storage: Uses legacy H2O Storage. entity-server: Uses the new HAIC Entity Server.
h2o_storage_address
h2o_storage_address (String)
Default value ''
Address of the H2O Storage endpoint. Keep empty to use the local storage only.
h2o_storage_projects_enabled
h2o_storage_projects_enabled (Boolean)
Default value False
Whether to use remote projects stored in H2O Storage instead of local projects.
h2o_storage_tls_enabled
h2o_storage_tls_enabled (Boolean)
Default value True
Whether the channel to the storage should be encrypted.
h2o_storage_tls_ca_path
h2o_storage_tls_ca_path (String)
Default value ''
Path to the certification authority certificate that H2O Storage server identity will be checked against.
h2o_storage_tls_cert_path
h2o_storage_tls_cert_path (String)
Default value ''
Path to the client certificate to authenticate with H2O Storage server
h2o_storage_tls_key_path
h2o_storage_tls_key_path (String)
Default value ''
Path to the client key to authenticate with H2O Storage server
h2o_storage_internal_default_project_id
h2o_storage_internal_default_project_id (String)
Default value ''
UUID of a Storage project to use instead of the remote HOME folder.
h2o_storage_rpc_deadline_seconds
h2o_storage_rpc_deadline_seconds (Number)
Default value 60
Deadline for RPC calls with H2O Storage in seconds. Sets maximum number of seconds that Driverless waits for RPC call to complete before it cancels it.
h2o_storage_rpc_bytestream_deadline_seconds
h2o_storage_rpc_bytestream_deadline_seconds (Number)
Default value 7200
Deadline for RPC bytestrteam calls with H2O Storage in seconds. Sets maximum number of seconds that Driverless waits for RPC call to complete before it cancels it. This value is used for uploading and downloading artifacts.
h2o_storage_oauth2_scopes
h2o_storage_oauth2_scopes (String)
Default value ''
Storage client manages it’s own access tokens derived from the refresh token received on the user login. When this option is set access token with the scopes defined here is requested. (space separated list)
h2o_storage_message_size_limit
h2o_storage_message_size_limit (Number)
Default value 1048576000
Maximum size of message size of RPC request in bytes. Requests larger than this limit will fail.
h2o_secure_store_endpoint_url
h2o_secure_store_endpoint_url (String)
Default value ''
H2O Secure Store server endpoint URL
h2o_secure_store_enable_tls
h2o_secure_store_enable_tls (Boolean)
Default value True
Enable TLS communication between DAI and the H2O Secure Store server
h2o_secure_store_tls_cert_path
h2o_secure_store_tls_cert_path (String)
Default value ''
Path to the client certificate to authenticate with the H2O Secure Store server. This is only effective when h2o_secure_store_enable_tls=True.
keystore_file
keystore_file (String)
Default value ''
Keystore file that contains secure config.toml items like passwords, secret keys etc. Keystore is managed by h2oai.keystore tool.
log_level
log_level (Number)
Default value 1
- Verbosity of logging
0: quiet (CRITICAL, ERROR, WARNING) 1: default (CRITICAL, ERROR, WARNING, INFO, DATA) 2: verbose (CRITICAL, ERROR, WARNING, INFO, DATA, DEBUG) Affects server and all experiments
collect_server_logs_in_experiment_logs
collect_server_logs_in_experiment_logs (Boolean)
Default value False
Whether to collect relevant server logs (h2oai_server.log, dai.log from systemctl or docker, and h2o log) Useful for when sending logs to H2O.ai
migrate_all_entities_to_user
migrate_all_entities_to_user (String)
Default value ''
When set, will migrate all user entities to the defined user upon startup, this is mostly useful during instance migration via H2O’s AIEM/Steam.
per_user_directories
per_user_directories (Boolean)
Default value True
Whether to have all user content isolated into a directory for each user. If set to False, all users content is common to single directory, recipes are shared, and brain folder for restart/refit is shared. If set to True, each user has separate folder for all user tasks, recipes are isolated to each user, and brain folder for restart/refit is only for the specific user. Migration from False to True or back to False is allowed for all experiment content accessible by GUI or python client, all recipes, and starting experiment with same settings, restart, or refit. However, if switch to per-user mode, the common brain folder is no longer used.
data_import_ignore_file_names
data_import_ignore_file_names (List)
Default value ['_SUCCESS']
List of file names to ignore during dataset import. Any files with names listed above will be skipped when DAI creates a dataset. Example, directory contains 3 files: [data_1.csv, data_2.csv, _SUCCESS] DAI will only attempt to create a dataset using files data_1.csv and data_2.csv, and _SUCCESS file will be ignored. Default is to ignore _SUCCESS files which are commonly created in exporting data from Hadoop
data_import_upcast_multi_file
data_import_upcast_multi_file (Boolean)
Default value False
For data import from a directory (multiple files), allow column types to differ and perform upcast during import.
data_import_explode_list_type_columns_in_parquet
data_import_explode_list_type_columns_in_parquet (Boolean)
Default value False
If set to true, will explode columns with list data type when importing parquet files.
files_without_extensions_expected_types
files_without_extensions_expected_types (List)
Default value ['parquet', 'orc']
List of file types that Driverless AI should attempt to import data as IF no file extension exists in the file name If no file extension is provided, Driverless AI will attempt to import the data starting with first type in the defined list. Default [“parquet”, “orc”] Example: ‘test.csv’ (file extension exists) vs ‘test’ (file extension DOES NOT exist)
NOTE: see supported_file_types configuration option for more details on supported file types
do_not_log_list
do_not_log_list (List)
Default value ['cols_to_drop', 'cols_to_drop_sanitized', 'cols_to_group_by', 'cols_to_group_by_sanitized', 'cols_to_force_in', 'cols_to_force_in_sanitized', 'do_not_log_list', 'do_not_store_list', 'pytorch_nlp_pretrained_s3_access_key_id', 'pytorch_nlp_pretrained_s3_secret_access_key', 'auth_openid_end_session_endpoint_url']
do_not_log_list : add configurations that you do not wish to be recorded in logs here.They will still be stored in experiment information so child experiments can behave consistently.
do_not_store_list
do_not_store_list (List)
Default value ['artifacts_git_password', 'auth_jwt_secret', 'auth_openid_client_id', 'auth_openid_client_secret', 'auth_openid_userinfo_auth_key', 'auth_openid_userinfo_auth_value', 'auth_openid_userinfo_username_key', 'auth_tls_ldap_bind_password', 'aws_access_key_id', 'aws_secret_access_key', 'azure_blob_account_key', 'azure_blob_account_name', 'azure_connection_string', 'azure_sas_token', 'deployment_aws_access_key_id', 'deployment_aws_secret_access_key', 'gcs_path_to_service_account_json', 'gcs_service_account_json', 'kaggle_key', 'kaggle_username', 'kdb_password', 'kdb_user', 'ldap_bind_password', 'ldap_search_password', 'local_htpasswd_file', 'main_server_minio_access_key_id', 'main_server_minio_secret_access_key', 'main_server_redis_password', 'minio_access_key_id', 'minio_endpoint_url', 'minio_secret_access_key', 'main_server_s3_access_key_id', 'main_server_s3_secret_access_key', 'snowflake_account', 'snowflake_password', 'snowflake_authenticator', 'snowflake_url', 'snowflake_user', 'custom_recipe_security_analysis_enabled', 'custom_recipe_import_allowlist', 'custom_recipe_import_banlist', 'custom_recipe_method_call_allowlist', 'custom_recipe_method_call_banlist', 'custom_recipe_dangerous_patterns', 'azure_ad_client_secret', 'azure_blob_keycloak_aad_client_secret', 'artifacts_azure_blob_account_name', 'artifacts_azure_blob_account_key', 'artifacts_azure_connection_string', 'artifacts_azure_sas_token', 'tensorflow_nlp_pretrained_s3_access_key_id', 'tensorflow_nlp_pretrained_s3_secret_access_key', 'ssl_key_passphrase', 'jdbc_app_configs', 'openai_api_secret_key', 'h2ogpt_key']
do_not_store_list : add configurations that you do not wish to be stored at all here.Will not be remembered across experiments, so not applicable to data science related itemsthat could be controlled by a user. These items are automatically not logged.
ping_sleep_period
ping_sleep_period (Float)
Default value 0.5
Period between checking DAI status. Should be small enough to avoid slowing parent who stops ping process.
data_precision
data_precision (String)
Default value 'float32'
Precision of how data is stored ‘datatable’ keeps original datatable storage types (i.e. bool, int, float32, float64) (experimental) ‘float32’ best for speed, ‘float64’ best for accuracy or very large input values, “datatable” best for memory ‘float32’ allows numbers up to about +-3E38 with relative error of about 1E-7 ‘float64’ allows numbers up to about +-1E308 with relative error of about 1E-16 Some calculations, like the GLM standardization, can only handle up to sqrt() of these maximums for data values, So GLM with 32-bit precision can only handle up to about a value of 1E19 before standardization generates inf values. If you see “Best individual has invalid score” you may require higher precision.
transformer_precision
transformer_precision (String)
Default value 'float32'
Precision of most data transformers (same options and notes as data_precision). Useful for higher precision in transformers with numerous operations that can accumulate error. Also useful if want faster performance for transformers but otherwise want data stored in high precision.
ulimit_up_to_hard_limit
ulimit_up_to_hard_limit (Boolean)
Default value True
Whether to change ulimit soft limits up to hard limits (for DAI server app, which is not a generic user app). Prevents resource limit problems in some cases. Restricted to no more than limit_nofile and limit_nproc for those resources.
disable_core_files
Whether to disable core files if debug_log=true. If debug_log=false, core file creation is always disabled. (Boolean)
Default value False
limit_nofile
limit_nofile (Number)
Default value 131071
number of file limit Below should be consistent with start-dai.sh
limit_nproc
limit_nproc (Number)
Default value 16384
number of threads limit Below should be consistent with start-dai.sh
produce_correlation_heatmap
produce_correlation_heatmap (Boolean)
Default value False
Whether to dump to disk a correlation heatmap
restart_experiments_after_shutdown
restart_experiments_after_shutdown (Boolean)
Default value False
If True, experiments aborted by server restart will automatically restart and continue upon user login
any_env_overrides
any_env_overrides (Boolean)
Default value False
When environment variable is set to toml value, consider that an override of any toml value. Experiment’s remember toml values for scoring, and this treats any environment set as equivalent to putting OVERRIDE_ in front of the environment key.
debug_print
Enable debug prints to console (Boolean) (Expert Setting)
Default value False
Whether to enable debug prints (to console/stdout/stderr), e.g. showing up in dai*.log or dai*.txt type files.
debug_print_level
Level of debug to print (Number) (Expert Setting)
Default value 0
Level (0-4) for debug prints (to console/stdout/stderr), e.g. showing up in dai*.log or dai*.txt type files. 1-2 is normal, 4 would lead to highly excessive debug and is not recommended in production.
return_quickly_autodl_testing
return_quickly_autodl_testing (Boolean)
Default value False
return_quickly_autodl_testing2
return_quickly_autodl_testing2 (Boolean)
Default value False
return_before_final_model
return_before_final_model (Boolean)
Default value False
enable_autodl_system_insights
Whether to enable autodl system insights. (Boolean)
Default value True
enable_deleting_autodl_system_insights_finished_experiments
Whether to enable autodl system insights finished experiments. (Boolean)
Default value True
main_logger_with_experiment_ids
main_logger_with_experiment_ids (Boolean)
Default value True
final_munging_memory_reduction_factor
Factor to reduce estimated memory usage by (Number) (Expert Setting)
Default value 2
Reduce memory usage during final ensemble feature engineering (1 uses most memory, larger values use less memory)
munging_memory_overhead_factor
Memory use per transformer per input data size (Number) (Expert Setting)
Default value 5
- How much more memory a typical transformer needs than the input data.
Can be increased if, e.g., final model munging uses too much memory due to parallel operations.
per_transformer_segfault_protection_ga
Whether to have per-transformer segfault protection when munging data into transformed features during tuning and evolution. Can lead to significant slowdown for cases when large data but data is sampled, leaving large objects in parent fork, leading to slow fork time for each transformer. (Boolean)
Default value False
per_transformer_segfault_protection_final
Whether to have per-transformer segfault protection when munging data into transformed features during final model fitting and scoring. Can lead to significant slowdown for cases when large data but data is sampled, leaving large objects in parent fork, leading to slow fork time for each transformer. (Boolean)
Default value False
submit_resource_wait_period
submit_resource_wait_period (Number)
Default value 10
How often to check resources (disk, memory, cpu) to see if need to stall submission.
stall_subprocess_submission_cpu_threshold_pct
stall_subprocess_submission_cpu_threshold_pct (Number)
Default value 100
Stall submission of subprocesses if system CPU usage is higher than this threshold in percent (set to 100 to disable). A reasonable number is 90.0 if activated
stall_subprocess_submission_dai_fork_threshold_pct
stall_subprocess_submission_dai_fork_threshold_pct (Float)
Default value -1.0
Restrict/Stall submission of subprocesses if DAI fork count (across all experiments) per unit ulimit nproc soft limit is higher than this threshold in percent (set to -1 to disable, 0 for minimal forking. A reasonable number is 90.0 if activated
stall_subprocess_submission_experiment_fork_threshold_pct
stall_subprocess_submission_experiment_fork_threshold_pct (Float)
Default value -1.0
Restrict/Stall submission of subprocesses if experiment fork count (across all experiments) per unit ulimit nproc soft limit is higher than this threshold in percent (set to -1 to disable, 0 for minimal forking). A reasonable number is 90.0 if activated. For small data leads to overhead of about 0.1s per task submitted due to checks, so for scoring can slow things down for tests.
restrict_initpool_by_memory
restrict_initpool_by_memory (Boolean)
Default value True
Whether to restrict pool workers even if not used, by reducing number of pool workers available. Good if really huge number of experiments, but otherwise, best to have all pool workers ready and only stall submission of tasks so can be dynamic to multi-experiment environment
users_disk_usage_quota
users_disk_usage_quota (Float)
Default value 1.0
A fraction that with valid values between 0.1 and 1.0 that determines the disk usage quota for a user, this quota will be checked during datasets import or experiment runs.
scoring_data_directory
scoring_data_directory (String)
Default value 'tmp'
Path to use for scoring directory path relative to run path
num_models_for_resume_graph
num_models_for_resume_graph (Number)
Default value 1000
mojo_acceptance_test_errors_fatal
mojo_acceptance_test_errors_fatal (Boolean)
Default value True
mojo_acceptance_test_errors_shap_fatal
mojo_acceptance_test_errors_shap_fatal (Boolean)
Default value True
mojo_acceptance_test_orig_shap
mojo_acceptance_test_orig_shap (Boolean)
Default value True
enable_single_instance_db_access
enable_single_instance_db_access (Boolean)
Default value True
If set to true, will make sure only current instance can access its database
dcgm_daemon_address
DCGM daemon address (String)
Default value '127.0.0.1'
DCGM daemon address, DCGM has to be in standalone mode in remote/local host.
enable_pytorch_nlp
enable_pytorch_nlp (String)
Default value 'auto'
Deprecated - maps to enable_pytorch_nlp_transformer and enable_pytorch_nlp_model in 1.10.2+
check_timeout_per_gpu
check_timeout_per_gpu (Number)
Default value 20
How long to wait per GPU for tensorflow/torch to run during system checks.
gpu_exit_if_fails
gpu_exit_if_fails (Boolean)
Default value True
Whether to fail start-up if cannot successfully run GPU checks
how_started
how_started (String)
Default value ''
wizard_state
wizard_state (String)
Default value ''
enable_telemetry
enable_telemetry (Boolean)
Default value False
Whether to enable pushing telemetry events to a configured telemetry receiver in ‘telemetry_plugins_dir’.
telemetry_plugins_dir
telemetry_plugins_dir (String)
Default value './telemetry_plugins'
Directory to scan for telemetry recipes.
h2o_telemetry_tls_enabled
h2o_telemetry_tls_enabled (Boolean)
Default value False
Whether to enable TLS to communicate to H2O.ai Telemetry Service.
h2o_telemetry_rpc_deadline_seconds
h2o_telemetry_rpc_deadline_seconds (Number)
Default value 60
Timeout value when communicating to H2O.ai Telemetry Service.
h2o_telemetry_address
h2o_telemetry_address (String)
Default value ''
H2O.ai Telemetry Service address in H2O.ai Cloud.
h2o_telemetry_service_token_location
h2o_telemetry_service_token_location (String)
Default value ''
H2O.ai Telemetry Service access token file location.
h2o_telemetry_tls_ca_path
h2o_telemetry_tls_ca_path (String)
Default value ''
TLS CA path when communicating to H2O.ai Telemetry Service.
h2o_telemetry_tls_cert_path
h2o_telemetry_tls_cert_path (String)
Default value ''
TLS certificate path when communicating to H2O.ai Telemetry Service.
h2o_telemetry_tls_key_path
h2o_telemetry_tls_key_path (String)
Default value ''
TLS key path when communicating to H2O.ai Telemetry Service.
user_config_directory
user_config_directory (String)
Default value ''
Every *.toml file is read from this directory and process the same way as main config file.
procsy_ip
procsy_ip (String)
Default value '127.0.0.1'
IP address for the procsy process.
procsy_port
procsy_port (Number)
Default value 12347
Port for the procsy process.
procsy_timeout
procsy_timeout (Number)
Default value 3600
Request timeout (in seconds) for the procsy process.
h2o_ip
h2o_ip (String)
Default value '127.0.0.1'
IP address for use by MLI.
h2o_port
h2o_port (Number)
Default value 12348
Port of H2O instance for use by MLI. Each H2O node has an internal port (web port+1, so by default port 12349) for internal node-to-node communication
ip
ip (String)
Default value '127.0.0.1'
IP address and port for Driverless AI HTTP server.
port
port (Number)
Default value 12345
IP address and port for Driverless AI HTTP server.
port_range
port_range (List)
Default value []
A list of two integers indicating the port range to search over, and dynamically find an open port to bind to (e.g., [11111,20000]).
strict_version_check
strict_version_check (Boolean)
Default value True
Strict version check for DAI
max_file_upload_size
max_file_upload_size (Number)
Default value 104857600000
File upload limit (default 100GB)
data_directory
data_directory (String)
Default value './tmp'
- Data directory. All application data and files related datasets and
experiments are stored in this directory.
db_path
db_path (String)
Default value ''
- Sets a custom path for the master.db. Use this to store the database outside the data directory,
which can improve performance if the data directory is on a slow drive.
datasets_directory
datasets_directory (String)
Default value ''
- Datasets directory. If set, it will denote the location from which all
datasets will be read from and written into, typically this location shall be configured to be on an external file system to allow for a more granular control to just the datasets volume. If empty then will default to data_directory.
data_connectors_logs_directory
data_connectors_logs_directory (String)
Default value './tmp'
Path to the directory where the logs of HDFS, Hive, JDBC, and KDB+ data connectors will be saved.
server_logs_sub_directory
server_logs_sub_directory (String)
Default value 'server_logs'
Subdirectory within data_directory to store server logs.
pid_sub_directory
pid_sub_directory (String)
Default value 'pids'
Subdirectory within data_directory to store pid files for controlling kill/stop of DAI servers.
mapr_tickets_directory
mapr_tickets_directory (String)
Default value './tmp/mapr-tickets'
Path to the directory which will be use to save MapR tickets when MapR multi-user mode is enabled. This is applicable only when enable_mapr_multi_user_mode is set to true.
mapr_tickets_duration_minutes
mapr_tickets_duration_minutes (Number)
Default value -1
MapR tickets duration in minutes, if set to -1, it will use the default value (not specified in maprlogin command), otherwise will be the specified configuration value but no less than one day.
remove_uploads_temp_files_server_start
remove_uploads_temp_files_server_start (Boolean)
Default value True
Whether at server start to delete all temporary uploaded files, left over from failed uploads.
remove_temp_files_server_start
remove_temp_files_server_start (Boolean)
Default value False
Whether to run through entire data directory and remove all temporary files. Can lead to slow start-up time if have large number (much greater than 100) of experiments.
remove_temp_files_aborted_experiments
remove_temp_files_aborted_experiments (Boolean)
Default value True
Whether to delete temporary files after experiment is aborted/cancelled.
usage_stats_opt_in
usage_stats_opt_in (Boolean)
Default value True
Whether to opt in to usage statistics and bug reporting
core_site_xml_path
core_site_xml_path (String)
Default value ''
Configurations for a HDFS data source Path of hdfs coresite.xml core_site_xml_path is deprecated, please use hdfs_config_path
hdfs_config_path
hdfs_config_path (String)
Default value ''
(Required) HDFS config folder path. Can contain multiple config files.
key_tab_path
key_tab_path (String)
Default value ''
Path of the principal key tab file. Required when hdfs_auth_type=’principal’. key_tab_path is deprecated, please use hdfs_keytab_path
hdfs_keytab_path
hdfs_keytab_path (String)
Default value ''
Path of the principal key tab file. Required when hdfs_auth_type=’principal’.
preview_cache_upon_server_exit
preview_cache_upon_server_exit (Boolean)
Default value True
Whether to delete preview cache on server exit
enable_health_api
Enable Health API (Boolean)
Default value True
When enabled, server exposes Health API at /apis/health/v1, which provides system overview and utilization statistics
listeners_inherit_env_variables
listeners_inherit_env_variables (Boolean)
Default value False
When enabled, the notification scripts will inherit the parent’s process (DriverlessAI) environment variables.
listeners_experiment_start
listeners_experiment_start (String)
Default value ''
Notification scripts - the variable points to a location of script which is executed at given event in experiment lifecycle - the script should have executable flag enabled - use of absolute path is suggested
The on experiment start notification script location
listeners_experiment_done
listeners_experiment_done (String)
Default value ''
The on experiment finished notification script location
listeners_experiment_import_done
listeners_experiment_import_done (String)
Default value ''
The on experiment import notification script location
listeners_mojo_done
listeners_mojo_done (String)
Default value ''
Notification script triggered when building of MOJO pipeline for experiment is finished. The value should be an absolute path to executable script.
listeners_autodoc_done
listeners_autodoc_done (String)
Default value ''
Notification script triggered when rendering of AutoDoc for experiment is finished. The value should be an absolute path to executable script.
listeners_scoring_pipeline_done
listeners_scoring_pipeline_done (String)
Default value ''
Notification script triggered when building of python scoring pipeline for experiment is finished. The value should be an absolute path to executable script.
listeners_experiment_artifacts_done
listeners_experiment_artifacts_done (String)
Default value ''
Notification script triggered when experiment and all its artifacts selected at the beginning of experiment are finished building. The value should be an absolute path to executable script.
enable_quick_benchmark
enable_quick_benchmark (Boolean)
Default value True
Whether to run quick performance benchmark at start of application
enable_extended_benchmark
enable_extended_benchmark (Boolean)
Default value False
Whether to run extended performance benchmark at start of application
extended_benchmark_scale_num_rows
extended_benchmark_scale_num_rows (Float)
Default value 0.1
Scaling factor for number of rows for extended performance benchmark. For rigorous performance benchmarking, values of 1 or larger are recommended.
extended_benchmark_num_cols
extended_benchmark_num_cols (Number)
Default value 20
Number of columns for extended performance benchmark.
benchmark_memory_timeout
benchmark_memory_timeout (Number)
Default value 2
Seconds to allow for testing memory bandwidth by generating numpy frames
benchmark_memory_vm_fraction
benchmark_memory_vm_fraction (Float)
Default value 0.25
Maximum portion of vm total to use for numpy memory benchmark
benchmark_memory_max_cols
benchmark_memory_max_cols (Number)
Default value 1500
Maximum number of columns to use for numpy memory benchmark
enable_startup_checks
enable_startup_checks (Boolean)
Default value True
Whether to run quick startup checks at start of application
application_id
application_id (String)
Default value ''
Application ID override, which should uniquely identify the instance
main_server_fork_timeout
main_server_fork_timeout (Float)
Default value 10.0
After how many seconds to abort MLI recipe execution plan or recipe compatibility checks. Blocks main server from all activities, so long timeout is not desired, esp. in case of hanging processes, while a short timeout can too often lead to abortions on busy system.
audit_log_retention_period
audit_log_retention_period (Number)
Default value 5
After how many days the audit log records are removed. Set equal to 0 to disable removal of old records.
dataset_tmp_upload_file_retention_time_min
dataset_tmp_upload_file_retention_time_min (Number)
Default value 5
Time to wait after performing a cleanup of temporary files for in-browser dataset upload.