System configuration¶
exclusive_mode
¶
Exclusive level of access to node resources (String) (Expert Setting)
Default value 'safe'
safe: assume might be running another experiment on same node moderate: assume not running any other experiments or tasks on same node, but still only use physical core count max: assume not running anything else on node at all except the experiment If multinode is enabled, this option has no effect, unless worker_remote_processors=1 when it will still be applied. Each exclusive mode can be chosen, and then fine-tuned using each expert settings. Changing the exclusive mode will reset all exclusive mode related options back to default and then re-apply the specific rules for the new mode, which will undo any fine-tuning of expert options that are part of exclusive mode rules. If choose to do new/continued/refitted/retrained experiment from parent experiment, all the mode rules are not re-applied and any fine-tuning is preserved. To reset mode behavior, one can switch between ‘safe’ and the desired mode. This way the new child experiment will use the default system resources for the chosen mode.
max_cores
¶
Number of cores to use (0 = all) (Number) (Expert Setting)
Default value 0
Max number of CPU cores to use for the whole system. Set to <= 0 to use all (physical) cores.
If the number of worker_remote_processors
is set to a value >= 3, the number of cores will be reduced
by the ratio (worker_remote_processors_max_threads_reduction_factor
* worker_remote_processors
)
to avoid overloading the system when too many remote tasks are processed at once.
One can also set environment variable ‘OMP_NUM_THREADS’ to number of cores to use for OpenMP
(e.g., in bash: ‘export OMP_NUM_THREADS=32’ and ‘export OPENBLAS_NUM_THREADS=32’).
small_data_recipe_work
¶
Small data work (String) (Expert Setting)
Default value 'auto'
Whether to treat data as small recipe in terms of work, by spreading many small tasks across many cores instead of forcing GPUs, for models that support it via static var _use_single_core_if_many. ‘auto’ looks at _use_single_core_if_many for models and data size, ‘on’ forces, ‘off’ disables.
max_fit_cores
¶
Maximum number of cores to use for model fit (Number) (Expert Setting)
Default value 10
Control maximum number of cores to use for a model’s fit call (0 = all physical cores >= 1 that count). See also tensorflow_model_max_cores to further limit TensorFlow main models.
parallel_score_max_workers
¶
Maximum number of cores to use for model parallel scoring (Number) (Expert Setting)
Default value 0
Control maximum number of cores to use for a scoring across all chosen scorers (0 = auto)
use_dask_cluster
¶
If full dask cluster is enabled, use full cluster (Boolean) (Expert Setting)
Default value True
Whether to use full multinode distributed cluster (True) or single-node dask (False). In some cases, using entire cluster can be inefficient. E.g. several DGX nodes can be more efficient if used one DGX at a time for medium-sized data.
max_predict_cores
¶
Maximum number of cores to use for model predict (Number) (Expert Setting)
Default value 0
Control maximum number of cores to use for a model’s predict call (0 = all physical cores >= 1 that count)
max_predict_cores_in_dai
¶
Maximum number of cores to use for model transform and predict when doing MLI and AutoDoc. (Number) (Expert Setting)
Default value -1
- Control maximum number of cores to use for a model’s transform and predict call when doing operations inside DAI-MLI GUI and R/Py client.
The main experiment and other tasks like MLI and autoreport have separate queues. The main experiments have run at most worker_remote_processors tasks (limited by cores if auto mode), while other tasks run at most worker_local_processors (limited by cores if auto mode) tasks at the same time, so many small tasks can add up. To prevent overloading the system, the defaults are conservative. However, if most of the activity involves autoreport or MLI, and no model experiments are running, it may be safe to increase this value to something larger than 4. -1 : Auto mode. Up to physical cores divided by 4, up to maximum of 10.
0 : all physical cores >= 1: that count).
batch_cpu_tuning_max_workers
¶
Tuning workers per batch for CPU (Number) (Expert Setting)
Default value 0
Control number of workers used in CPU mode for tuning (0 = socket count -1 = all physical cores >= 1 that count). More workers will be more parallel but models learn less from each other.
cpu_max_workers
¶
Num. workers for CPU training (Number) (Expert Setting)
Default value 0
Control number of workers used in CPU mode for training (0 = socket count -1 = all physical cores >= 1 that count)
assumed_simultaneous_dt_forks_munging
¶
Assumed/Expected number of munging forks (Number) (Expert Setting)
Default value 3
Expected maximum number of forks, used to ensure datatable doesn’t overload system. For actual use beyond this value, system will start to have slow-down issues
max_max_dt_threads_munging
¶
Max. threads for datatable munging (Number) (Expert Setting)
Default value 4
Maximum of threads for datatable for munging
max_max_dt_threads_readwrite
¶
Max. threads for datatable reading/writing (Number) (Expert Setting)
Default value 4
Maximum of threads for datatable for reading/writing files
max_workers_final_base_models
¶
Max. workers for final model building (Number) (Expert Setting)
Default value 0
Maximum parallel workers for final model building. 0 means automatic, >=1 means limit to no more than that number of parallel jobs. Can be required if some transformer or model uses more than the expected amount of memory. Ways to reduce final model building memory usage, e.g. set one or more of these and retrain final model: 1) Increase munging_memory_overhead_factor to 10 2) Increase final_munging_memory_reduction_factor to 10 3) Lower max_workers_final_munging to 1 4) Lower max_workers_final_base_models to 1 5) Lower max_cores to, e.g., 1/2 or 1/4 of physical cores.
max_workers_final_munging
¶
Max. workers for final per-model munging (Number) (Expert Setting)
Default value 0
Maximum parallel workers for final per-model munging. 0 means automatic, >=1 means limit to no more than that number of parallel jobs. Can be required if some transformer uses more than the expected amount of memory.
min_dt_threads_munging
¶
min_dt_threads_munging (Number) (Expert Setting)
Default value 1
Minimum number of threads for datatable (and OpenMP) during data munging (per process). datatable is the main data munging tool used within Driverless ai (source : https://github.com/h2oai/datatable)
min_dt_threads_final_munging
¶
min_dt_threads_final_munging (Number) (Expert Setting)
Default value 1
Like min_datatable (and OpenMP)_threads_munging but for final pipeline munging
max_dt_threads_munging
¶
Max. Num. of threads to use for datatable and openblas for munging and model training (0 = all, -1 = auto) (Number) (Expert Setting)
Default value -1
Maximum number of threads for datatable during data munging (per process) (0 = all, -1 = auto). If multiple forks, threads are distributed across forks.
max_dt_threads_readwrite
¶
Max. Num. of threads to use for datatable read and write of files (0 = all, -1 = auto) (Number) (Expert Setting)
Default value -1
Maximum number of threads for datatable during data reading and writing (per process) (0 = all, -1 = auto). If multiple forks, threads are distributed across forks.
max_dt_threads_stats_openblas
¶
Max. Num. of threads to use for datatable stats and openblas (0 = all, -1 = auto) (Number) (Expert Setting)
Default value -1
Maximum number of threads for datatable stats and openblas (per process) (0 = all, -1 = auto). If multiple forks, threads are distributed across forks.
num_gpus_per_experiment
¶
#GPUs/Experiment (-1 = autodetect or all) (Number) (Expert Setting)
Default value -1
Number of GPUs to use per experiment for training task. Set to -1 for all GPUs. An experiment will generate many different models. Currently num_gpus_per_experiment!=-1 disables GPU locking, so is only recommended for single experiments and single users. Ignored if GPUs disabled or no GPUs on system. More info at: https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker#gpu-isolation In multinode context when using dask, this refers to the per-node value. For ImageAutoModel, this refers to the total number of GPUs used for that entire model type, since there is only one model type for the entire experiment. E.g. if have 4 GPUs and want 2 ImageAuto experiments to run on 2 GPUs each, can set num_gpus_per_experiment to 2 for each experiment, and each of the 4 GPUs will be used one at a time by the 2 experiments each using 2 GPUs only.
min_num_cores_per_gpu
¶
Num Cores/GPU (Number) (Expert Setting)
Default value -2
- Number of CPU cores per GPU. Limits number of GPUs in order to have sufficient cores per GPU.
Set to -1 to disable, -2 for auto mode. In auto mode, if lightgbm_use_gpu is ‘auto’ or ‘off’, then min_num_cores_per_gpu=1, else min_num_cores_per_gpu=2, due to lightgbm requiring more cores even when using GPUs.
num_gpus_per_model
¶
#GPUs/Model (-1 = all) (Number) (Expert Setting)
Default value 1
Number of GPUs to use per model training task. Set to -1 for all GPUs. For example, when this is set to -1 and there are 4 GPUs available, all of them can be used for the training of a single model. Only applicable currently to image auto pipeline building recipe or Dask models with more than one GPU or more than one node. Ignored if GPUs disabled or no GPUs on system. For ImageAutoModel, the maximum of num_gpus_per_model and num_gpus_per_experiment (all GPUs if -1) is taken. More info at: https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker#gpu-isolation In multinode context when using Dask, this refers to the per-node value.
num_gpus_for_prediction
¶
Num. of GPUs for isolated prediction/transform (Number) (Expert Setting)
Default value 0
Number of GPUs to use for predict for models and transform for transformers when running outside of fit/fit_transform. -1 means all, 0 means no GPUs, >1 means that many GPUs up to visible limit. If predict/transform are called in same process as fit/fit_transform, number of GPUs will match, while new processes will use this count for number of GPUs for applicable models/transformers. Exception: TensorFlow, PyTorch models/transformers, and RAPIDS predict on GPU always if GPUs exist. RAPIDS requires python scoring package be used also on GPUs. In multinode context when using Dask, this refers to the per-node value.
gpu_id_start
¶
GPU starting ID (0..visible #GPUs - 1) (Number) (Expert Setting)
Default value -1
Which gpu_id to start with -1 : auto-mode. E.g. 2 experiments can each set num_gpus_per_experiment to 2 and use 4 GPUs If using CUDA_VISIBLE_DEVICES=… to control GPUs (preferred method), gpu_id=0 is the first in that restricted list of devices. E.g. if CUDA_VISIBLE_DEVICES=’4,5’ then gpu_id_start=0 will refer to the device #4. E.g. from expert mode, to run 2 experiments, each on a distinct GPU out of 2 GPUs: Experiment#1: num_gpus_per_model=1, num_gpus_per_experiment=1, gpu_id_start=0 Experiment#2: num_gpus_per_model=1, num_gpus_per_experiment=1, gpu_id_start=1 E.g. from expert mode, to run 2 experiments, each on a distinct GPU out of 8 GPUs: Experiment#1: num_gpus_per_model=1, num_gpus_per_experiment=4, gpu_id_start=0 Experiment#2: num_gpus_per_model=1, num_gpus_per_experiment=4, gpu_id_start=4 E.g. Like just above, but now run on all 4 GPUs/model Experiment#1: num_gpus_per_model=4, num_gpus_per_experiment=4, gpu_id_start=0 Experiment#2: num_gpus_per_model=4, num_gpus_per_experiment=4, gpu_id_start=4 If num_gpus_per_model!=1, global GPU locking is disabled (because underlying algorithms don’t support arbitrary gpu ids, only sequential ids), so must setup above correctly to avoid overlap across all experiments by all users More info at: https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker#gpu-isolation Note that GPU selection does not wrap, so gpu_id_start + num_gpus_per_model must be less than number of visibile GPUs
allow_reduce_features_when_failure
¶
Whether to reduce features when model fails (String) (Expert Setting)
Default value 'auto'
Whether to reduce features until model does not fail. Currently for non-dask XGBoost models (i.e. GLMModel, XGBoostGBMModel, XGBoostDartModel, XGBoostRFModel), during normal fit or when using Optuna. Primarily useful for GPU OOM. If XGBoost runs out of GPU memory, this is detected, and (regardless of setting of skip_model_failures), we perform feature selection using XGBoost on subsets of features. The dataset is progressively reduced by factor of 2 with more models to cover all features. This splitting continues until no failure occurs. Then all sub-models are used to estimate variable importance by absolute information gain, in order to decide which features to include. Finally, a single model with the most important features is built using the feature count that did not lead to OOM. For ‘auto’, this option is set to ‘off’ when reproducible experiment is enabled, because the condition of running OOM can change for same experiment seed. Reduction is only done on features and not on rows for the feature selection step.
reduce_repeats_when_failure
¶
Number of repeats for models used for feature selection during failure recovery. (Number) (Expert Setting)
Default value 1
With allow_reduce_features_when_failure, this controls how many repeats of sub-models used for feature selection. A single repeat only has each sub-model consider a single sub-set of features, while repeats shuffle which features are considered allowing more chance to find important interactions. More repeats can lead to higher accuracy. The cost of this option is proportional to the repeat count.
fraction_anchor_reduce_features_when_failure
¶
Fraction of features treated as anchor for feature selection during failure recovery. (Float) (Expert Setting)
Default value 0.1
With allow_reduce_features_when_failure, this controls the fraction of features treated as an anchor that are fixed for all sub-models. Each repeat gets new anchors. For tuning and evolution, the probability depends upon any prior importance (if present) from other individuals, while final model uses uniform probability for anchor features.
xgboost_reduce_on_errors_list
¶
Errors from XGBoost that trigger reduction of features (List) (Expert Setting)
Default value ['Memory allocation error on worker', 'out of memory', 'XGBDefaultDeviceAllocatorImpl', 'invalid configuration argument', 'Requested memory']
Error strings from XGBoost that are used to trigger re-fit on reduced sub-models. See allow_reduce_features_when_failure.
lightgbm_reduce_on_errors_list
¶
Errors from LightGBM that trigger reduction of features (List) (Expert Setting)
Default value ['Out of Host Memory']
Error strings from LightGBM that are used to trigger re-fit on reduced sub-models. See allow_reduce_features_when_failure.
lightgbm_use_gpu
¶
Whether to use GPUs for LightGBM (String) (Expert Setting)
Default value 'auto'
- LightGBM does not significantly benefit from GPUs, unlike other tools like XGBoost or Bert/Image Models.
Each experiment will try to use all GPUs, and on systems with many cores and GPUs, this leads to many experiments running at once, all trying to lock the GPU for use, leaving the cores heavily under-utilized. So by default, DAI always uses CPU for LightGBM, unless ‘on’ is specified.
num_gpus_per_hyperopt_dask
¶
#GPUs/HyperOptDask (-1 = all) (Number) (Expert Setting)
Default value -1
Number of GPUs to use per model hyperopt training task. Set to -1 for all GPUs. For example, when this is set to -1 and there are 4 GPUs available, all of them can be used for the training of a single model across a Dask cluster. Ignored if GPUs disabled or no GPUs on system. In multinode context, this refers to the per-node value.
detailed_traces
¶
Enable detailed traces (Boolean) (Expert Setting)
Default value False
Whether to enable detailed traces (in GUI Trace)
debug_log
¶
Enable debug log level (Boolean) (Expert Setting)
Default value False
Whether to enable debug log level (in log files)
log_system_info_per_experiment
¶
Enable logging of system information for each experiment (Boolean) (Expert Setting)
Default value True
Whether to add logging of system information such as CPU, GPU, disk space at the start of each experiment log. Same information is already logged in system logs.
enable_h2o_recipes
¶
Enable h2o recipes server (Boolean) (Expert Setting)
Default value True
Whether to enable use of H2O recipe server. In some casees, recipe server (started at DAI startup) may enter into an unstable state, and this might affect other experiments. Then one can avoid triggering use of the recipe server by setting this to false.
enable_projects
¶
Enable Projects workspace (Boolean)
Default value True
Enable Projects workspace (alpha version, for evaluation)
enable_license_manager
¶
enable_license_manager (Boolean)
Default value False
Switches Driverless AI to use H2O.ai License Management Server to manage licenses/permission to use software
license_manager_address
¶
license_manager_address (String)
Default value 'http://127.0.0.1:9999'
Address at which to communicate with H2O.ai License Management Server. Requires above value, enable_license_manager set to True. Format: {http/https}://{ip address}:{port number}
license_manager_project_name
¶
license_manager_project_name (String)
Default value 'default'
Name of license manager project that Driverless AI will attempt to retrieve leases from. NOTE: requires an active license within the License Manager Server to function properly
license_manager_lease_duration
¶
license_manager_lease_duration (Number)
Default value 3600000
Number of milliseconds a lease for users will be expected to last, if using the H2O.ai License Manager server, before the lease REQUIRES renewal.
Default: 3600000 (1 hour) = 1 hour * 60 min / hour * 60 sec / min * 1000 milliseconds / sec
license_manager_worker_lease_duration
¶
license_manager_worker_lease_duration (Number)
Default value 21600000
Number of milliseconds a lease for Driverless AI worker nodes will be expected to last, if using the H2O.ai License Manager server, before the lease REQUIRES renewal. Default: 21600000 (6 hour) = 6 hour * 60 min / hour * 60 sec / min * 1000 milliseconds / sec
license_manager_ssl_certs
¶
license_manager_ssl_certs (String)
Default value 'true'
To be used only if License Manager server is started with HTTPS Accepts a boolean: true/false, or a path to a file/directory. Denotates whether or not to attempt SSL Certificate verification when making a request to the License Manager server. True: attempt ssl certificate verification, will fail if certificates are self signed False: skip ssl certificate verification. /path/to/cert/directory: load certificates <cert.pem> in directory and use those for certificate verification Behaves in the same manner as python requests package: https://requests.readthedocs.io/en/latest/user/advanced/#ssl-cert-verification
license_manager_worker_startup_timeout
¶
license_manager_worker_startup_timeout (Number)
Default value 3600000
Amount of time that Driverless AI workers will keep retrying to startup and obtain a lease from the license manager before timing out. Time out will cause worker startup to fail.
license_manager_dry_run_token
¶
license_manager_dry_run_token (String)
Default value ''
Emergency setting that will allow Driverless AI to run even if there is issues communicating with or obtaining leases from, the License Manager server.
This is an encoded string that can be obtained from either the license manager ui or the logs of the license manager server.
start_dask_worker
¶
Start dask workers for given multinode worker (Boolean)
Default value True
Whether to start dask workers on this multinode worker.
dask_cuda_scheduler_env
¶
Set dask cuda scheduler env. (Dict)
Default value {}
Set dask scheduler env. See https://docs.dask.org/en/latest/setup/cli.html
dask_scheduler_options
¶
Set dask scheduler command-line options. (String)
Default value ''
Set dask scheduler options. See https://docs.dask.org/en/latest/setup/cli.html
dask_cuda_scheduler_options
¶
Set dask cuda scheduler command-line options. (String)
Default value ''
Set dask cuda scheduler options. See https://docs.dask.org/en/latest/setup/cli.html
dask_worker_options
¶
Set dask worker command-line options. (String)
Default value '--memory-limit 0.95'
Set dask worker options. See https://docs.dask.org/en/latest/setup/cli.html
dask_cuda_worker_options
¶
Set dask cuda worker options. (String)
Default value '--memory-limit 0.95'
Set dask cuda worker options. Similar options as dask_cuda_cluster_kwargs. See https://dask-cuda.readthedocs.io/en/latest/ucx.html#launching-scheduler-workers-and-clients-separately “–rmm-pool-size 1GB” can be set to give 1GB to RMM for more efficient rapids
dask_protocol
¶
Protocol using for dask communications. (String)
Default value 'tcp'
See https://docs.dask.org/en/latest/setup/cli.html e.g. ucx is optimal, while tcp is most reliable
dask_server_port
¶
Port using by server for dask communications. (Number)
Default value 8786
dask_dashboard_port
¶
Dask dashboard port for dask diagnostics. (Number)
Default value 8787
dask_cuda_protocol
¶
Protocol using for dask cuda communications. (String)
Default value 'tcp'
See https://docs.dask.org/en/latest/setup/cli.html e.g. ucx is optimal, while tcp is most reliable
dask_cuda_server_port
¶
Port using by server for dask cuda communications. (Number)
Default value 8790
See https://docs.dask.org/en/latest/setup/cli.html port + 1 is used for dask dashboard
dask_cuda_dashboard_port
¶
Dask dashboard port for dask_cuda diagnostics. (Number)
Default value 8791
dask_server_ip
¶
IP address using by server for dask and dask cuda communications. (String)
Default value ''
If empty string, auto-detect IP capable of reaching network. Required to be set if using worker_mode=multinode.
dask_worker_nprocs
¶
Number of processes per dask worker. (Number)
Default value 1
Number of processses per dask (not cuda-GPU) worker. If -1, uses dask default of cpu count + 1 + nprocs. If -2, uses DAI default of total number of physical cores. Recommended for heavy feature engineering. If 1, assumes tasks are mostly multi-threaded and can use entire node per task. Recommended for heavy multinode model training. Only applicable to dask (not dask_cuda) workers
dask_worker_nthreads
¶
Number of threads per process for dask. (Number)
Default value 1
Number of threads per process for dask workers
dask_cuda_worker_nthreads
¶
Number of threads per process for dask_cuda. (Number)
Default value -2
Number of threads per process for dask_cuda workers If -2, uses DAI default of physical cores per GPU, since must have 1 worker/GPU only.
lightgbm_listen_port
¶
LightGBM local listen port when using dask with lightgbm (Number)
Default value 12400
enable_jupyter_server
¶
enable_jupyter_server (Boolean)
Default value False
Whether to enable jupyter server
jupyter_server_port
¶
jupyter_server_port (Number)
Default value 8889
Port for jupyter server
enable_jupyter_server_browser
¶
enable_jupyter_server_browser (Boolean)
Default value False
Whether to enable jupyter server browser
enable_jupyter_server_browser_root
¶
enable_jupyter_server_browser_root (Boolean)
Default value False
Whether to root access to jupyter server browser
triton_log_level
¶
triton_log_level (Number)
Default value 0
triton_model_reload_on_startup_count
¶
triton_model_reload_on_startup_count (Number)
Default value 0
triton_clean_up_temp_python_env_on_startup
¶
triton_clean_up_temp_python_env_on_startup (Boolean)
Default value True
multinode_enable_strict_queue_policy
¶
multinode_enable_strict_queue_policy (Boolean)
Default value False
When set to true, CPU executors will strictly run just CPU tasks.
multinode_enable_cpu_tasks_on_gpu_machines
¶
multinode_enable_cpu_tasks_on_gpu_machines (Boolean)
Default value True
Controls whether CPU tasks can run on GPU machines.
multinode_storage_medium
¶
multinode_storage_medium (String)
Default value 'minio'
Storage medium to be used to exchange data between main server and remote worker nodes.
worker_mode
¶
worker_mode (String)
Default value 'singlenode'
- How the long running tasks are scheduled.
multiprocessing: forks the current process immediately. singlenode: shares the task through redis and needs a worker running. multinode: same as singlenode and also shares the data through minio
and allows worker to run on the different machine.
redis_ip
¶
redis_ip (String)
Default value '127.0.0.1'
Redis settings
redis_port
¶
redis_port (Number)
Default value 6379
Redis settings
redis_db
¶
redis_db (Number)
Default value 0
Redis database. Each DAI instance running on the redis server should have unique integer.
main_server_redis_password
¶
main_server_redis_password (String)
Default value 'PlWUjvEJSiWu9j0aopOyL5KwqnrKtyWVoZHunqxr'
Redis password. Will be randomly generated main server startup, and by default it will show up in config file uncommented.If you are running more than one DriverlessAI instance per system, make sure each and every instance is connected to its own redis queue.
redis_encrypt_config
¶
redis_encrypt_config (Boolean)
Default value False
If set to true, the config will get encrypted before it gets saved into the Redis database.
local_minio_port
¶
local_minio_port (Number)
Default value 9001
The port that Minio will listen on, this only takes effect if the current system is a multinode main server.
main_server_minio_address
¶
main_server_minio_address (String)
Default value '127.0.0.1:9001'
Location of main server’s minio server.
main_server_minio_access_key_id
¶
main_server_minio_access_key_id (String)
Default value 'GMCSE2K2T3RV6YEHJUYW'
Access key of main server’s minio server.
main_server_minio_secret_access_key
¶
main_server_minio_secret_access_key (String)
Default value 'JFxmXvE/W1AaqwgyPxAUFsJZRnDWUaeQciZJUe9H'
Secret access key of main server’s minio server.
main_server_minio_bucket
¶
main_server_minio_bucket (String)
Default value 'h2oai'
Name of minio bucket used for file synchronization.
main_server_s3_access_key_id
¶
main_server_s3_access_key_id (String)
Default value 'access_key'
S3 global access key.
main_server_s3_secret_access_key
¶
main_server_s3_secret_access_key (String)
Default value 'secret_access_key'
S3 global secret access key
main_server_s3_bucket
¶
main_server_s3_bucket (String)
Default value 'h2oai-multinode-tests'
S3 bucket.
worker_local_processors
¶
worker_local_processors (Number)
Default value 32
Maximum number of local tasks processed at once, limited to no more than total number of physical (not virtual) cores divided by two (minimum of 1).
worker_priority_queues_processors
¶
worker_priority_queues_processors (Number)
Default value 4
A concurrency limit for the 3 priority queues, only enabled when worker_remote_processors is greater than 0.
worker_priority_queues_time_check
¶
worker_priority_queues_time_check (Number)
Default value 30
A timeout before which a scheduled task is bumped up in priority
worker_remote_processors
¶
worker_remote_processors (Number)
Default value -1
Maximum number of remote tasks processed at once, if value is set to -1 the system will automatically pick a reasonable limit depending on the number of available virtual CPU cores.
worker_remote_processors_max_threads_reduction_factor
¶
worker_remote_processors_max_threads_reduction_factor (Float)
Default value 0.7
If worker_remote_processors >= 3, factor by which each task reduces threads, used by various packages like datatable, lightgbm, xgboost, etc.
multinode_tmpfs
¶
multinode_tmpfs (String)
Default value ''
Temporary file system location for multinode data transfer. This has to be an absolute path with equivalent configuration on both the main server and remote workers.
multinode_store_datasets_in_tmpfs
¶
multinode_store_datasets_in_tmpfs (Boolean)
Default value False
When set to true, will use the ‘multinode_tmpfs’ as datasets store.
redis_result_queue_polling_interval
¶
redis_result_queue_polling_interval (Number)
Default value 100
How often the server should extract results from redis queue in milliseconds.
worker_sleep
¶
worker_sleep (Float)
Default value 0.1
Sleep time for worker loop.
main_server_minio_bucket_ping_timeout
¶
main_server_minio_bucket_ping_timeout (Number)
Default value 180
For how many seconds worker should wait for main server minio bucket before it fails
worker_start_timeout
¶
worker_start_timeout (Number)
Default value 30
How long the worker should wait on redis db initialization in seconds.
worker_no_main_server_wait_time
¶
worker_no_main_server_wait_time (Number)
Default value 1800
worker_no_main_server_wait_time_with_hard_assert
¶
worker_no_main_server_wait_time_with_hard_assert (Number)
Default value 30
worker_healthy_response_period
¶
worker_healthy_response_period (Number)
Default value 300
For how many seconds the worker shouldn’t respond to be marked unhealthy.
expose_server_version
¶
expose_server_version (Boolean)
Default value True
Exposes the DriverlessAI base version when enabled.
enable_https
¶
enable_https (Boolean)
Default value False
https settings
You can make a self-signed certificate for testing with the following commands:
sudo openssl req -x509 -newkey rsa:4096 -keyout private_key.pem -out cert.pem -days 3650 -nodes -subj ‘/O=Driverless AI’ sudo chown dai:dai cert.pem private_key.pem sudo chmod 600 cert.pem private_key.pem sudo mv cert.pem private_key.pem /etc/dai
ssl_key_file
¶
ssl_key_file (String)
Default value '/etc/dai/private_key.pem'
https settings
You can make a self-signed certificate for testing with the following commands:
sudo openssl req -x509 -newkey rsa:4096 -keyout private_key.pem -out cert.pem -days 3650 -nodes -subj ‘/O=Driverless AI’ sudo chown dai:dai cert.pem private_key.pem sudo chmod 600 cert.pem private_key.pem sudo mv cert.pem private_key.pem /etc/dai
ssl_crt_file
¶
ssl_crt_file (String)
Default value '/etc/dai/cert.pem'
https settings
You can make a self-signed certificate for testing with the following commands:
sudo openssl req -x509 -newkey rsa:4096 -keyout private_key.pem -out cert.pem -days 3650 -nodes -subj ‘/O=Driverless AI’ sudo chown dai:dai cert.pem private_key.pem sudo chmod 600 cert.pem private_key.pem sudo mv cert.pem private_key.pem /etc/dai
ssl_key_passphrase
¶
ssl_key_passphrase (String)
Default value ''
https settings
Passphrase for the ssl_key_file, either use this setting or ssl_key_passphrase_file, or neither if no passphrase is used.
ssl_key_passphrase_file
¶
ssl_key_passphrase_file (String)
Default value ''
https settings
Passphrase file for the ssl_key_file, either use this setting or ssl_key_passphrase, or neither if no passphrase is used.
ssl_no_sslv2
¶
ssl_no_sslv2 (Boolean)
Default value True
SSL TLS
ssl_no_sslv3
¶
ssl_no_sslv3 (Boolean)
Default value True
SSL TLS
ssl_no_tlsv1
¶
ssl_no_tlsv1 (Boolean)
Default value True
SSL TLS
ssl_no_tlsv1_1
¶
ssl_no_tlsv1_1 (Boolean)
Default value True
SSL TLS
ssl_no_tlsv1_2
¶
ssl_no_tlsv1_2 (Boolean)
Default value False
SSL TLS
ssl_no_tlsv1_3
¶
ssl_no_tlsv1_3 (Boolean)
Default value False
SSL TLS
ssl_client_verify_mode
¶
ssl_client_verify_mode (String)
Default value 'CERT_NONE'
https settings
Sets the client verification mode.
- CERT_NONE: Client does not need to provide the certificate and if it does any
verification errors are ignored.
- CERT_OPTIONAL: Client does not need to provide the certificate and if it does
certificate is verified against set up CA chains.
- CERT_REQUIRED: Client needs to provide a certificate and certificate is
verified. You’ll need to set ‘ssl_client_key_file’ and ‘ssl_client_crt_file’ When this mode is selected for Driverless to be able to verify it’s own callback requests.
ssl_ca_file
¶
ssl_ca_file (String)
Default value ''
https settings
Path to the Certification Authority certificate file. This certificate will be used when to verify client certificate when client authentication is turned on.
If this is not set, clients are verified using default system certificates.
ssl_client_key_file
¶
ssl_client_key_file (String)
Default value ''
https settings
path to the private key that Driverless will use to authenticate itself when CERT_REQUIRED mode is set.
ssl_client_crt_file
¶
ssl_client_crt_file (String)
Default value ''
https settings
path to the client certificate that Driverless will use to authenticate itself when CERT_REQUIRED mode is set.
enable_xsrf_protection
¶
Enable XSRF Webserver protection (Boolean)
Default value True
If enabled, webserver will serve xsrf cookies and verify their validity upon every POST request
enable_secure_cookies
¶
Enable secure flag on HTTP cookies (Boolean)
Default value False
verify_session_ip
¶
When enabled, webserver verifies session and request IP address (Boolean)
Default value False
When enabled each authenticated access will be verified comparing IP address of initiator of session and current request IP
custom_recipe_security_analysis_enabled
¶
custom_recipe_security_analysis_enabled (Boolean)
Default value False
Enables automatic detection for forbidden/dangerous constructs in custom recipe
custom_recipe_import_allowlist
¶
custom_recipe_import_allowlist (List)
Default value []
List of modules that can be imported in custom recipes. Default empty list means all modules are allowed except for banlisted ones
custom_recipe_import_banlist
¶
custom_recipe_import_banlist (List)
Default value ['shlex', 'plumbum', 'pexpect', 'envoy', 'commands', 'fabric', 'subprocess', 'os.system', 'system']
List of modules that cannot be imported in custom recipes
custom_recipe_method_call_allowlist
¶
custom_recipe_method_call_allowlist (List)
Default value []
- Regex pattern list of calls which are allowed in custom recipes.
Empty list means everything (except for banlist) is allowed. E.g. if only os.path.* is in allowlist, custom recipe can only call methods from os.path module and the built in ones
custom_recipe_method_call_banlist
¶
custom_recipe_method_call_banlist (List)
Default value ['os\\.system', 'socket\\..*', 'subprocess.*', 'os.spawn.*']
- Regex pattern list of calls which need to be rejected in custom recipes.
E.g. if os.system in banlist, custom recipe cannot call os.system(). If socket.* in banlist, recipe cannot call any method of socket module such as socket.socket() or any socket.a.b.c()
custom_recipe_dangerous_patterns
¶
custom_recipe_dangerous_patterns (List)
Default value ['rm -rf', 'rm -fr']
- List of regex patterns representing dangerous sequences/constructs
which could be harmful to whole system and should be banned from code
allow_concurrent_sessions
¶
Enable concurrent session for same user (Boolean)
Default value True
If enabled, user can log in from 2 browsers (scripts) at the same time
extra_http_headers
¶
extra_http_headers (Dict)
Default value {}
Extra HTTP headers.
http_cookie_attributes
¶
Extra HTTP cookie flags (Dict)
Default value {'samesite': 'Lax'}
By default DriverlessAI issues cookies with HTTPOnly and Secure attributes (morsels) enabled. In addition to that, SameSite attribute is set to ‘Lax’, as it’s a default in modern browsers. The config overrides the default key/value (morsels).
h2o_storage_mode
¶
h2o_storage_mode (String)
Default value 'h2o-storage'
- Specifies whether DriverlessAI uses H2O Storage or H2O Entity Server for
a shared entities backend.
h2o-storage: Uses legacy H2O Storage. entity-server: Uses the new HAIC Entity Server.
h2o_storage_address
¶
h2o_storage_address (String)
Default value ''
Address of the H2O Storage endpoint. Keep empty to use the local storage only.
h2o_storage_projects_enabled
¶
h2o_storage_projects_enabled (Boolean)
Default value False
Whether to use remote projects stored in H2O Storage instead of local projects.
h2o_storage_tls_enabled
¶
h2o_storage_tls_enabled (Boolean)
Default value True
Whether the channel to the storage should be encrypted.
h2o_storage_tls_ca_path
¶
h2o_storage_tls_ca_path (String)
Default value ''
Path to the certification authority certificate that H2O Storage server identity will be checked against.
h2o_storage_tls_cert_path
¶
h2o_storage_tls_cert_path (String)
Default value ''
Path to the client certificate to authenticate with H2O Storage server
h2o_storage_tls_key_path
¶
h2o_storage_tls_key_path (String)
Default value ''
Path to the client key to authenticate with H2O Storage server
h2o_storage_internal_default_project_id
¶
h2o_storage_internal_default_project_id (String)
Default value ''
UUID of a Storage project to use instead of the remote HOME folder.
h2o_storage_rpc_deadline_seconds
¶
h2o_storage_rpc_deadline_seconds (Number)
Default value 60
Deadline for RPC calls with H2O Storage in seconds. Sets maximum number of seconds that Driverless waits for RPC call to complete before it cancels it.
h2o_storage_rpc_bytestream_deadline_seconds
¶
h2o_storage_rpc_bytestream_deadline_seconds (Number)
Default value 7200
Deadline for RPC bytestrteam calls with H2O Storage in seconds. Sets maximum number of seconds that Driverless waits for RPC call to complete before it cancels it. This value is used for uploading and downloading artifacts.
h2o_storage_oauth2_scopes
¶
h2o_storage_oauth2_scopes (String)
Default value ''
Storage client manages it’s own access tokens derived from the refresh token received on the user login. When this option is set access token with the scopes defined here is requested. (space separated list)
h2o_storage_message_size_limit
¶
h2o_storage_message_size_limit (Number)
Default value 1048576000
Maximum size of message size of RPC request in bytes. Requests larger than this limit will fail.
h2o_secure_store_endpoint_url
¶
h2o_secure_store_endpoint_url (String)
Default value ''
H2O Secure Store server endpoint URL
h2o_secure_store_enable_tls
¶
h2o_secure_store_enable_tls (Boolean)
Default value True
Enable TLS communication between DAI and the H2O Secure Store server
h2o_secure_store_tls_cert_path
¶
h2o_secure_store_tls_cert_path (String)
Default value ''
Path to the client certificate to authenticate with the H2O Secure Store server. This is only effective when h2o_secure_store_enable_tls=True.
keystore_file
¶
keystore_file (String)
Default value ''
Keystore file that contains secure config.toml items like passwords, secret keys etc. Keystore is managed by h2oai.keystore tool.
log_level
¶
log_level (Number)
Default value 1
- Verbosity of logging
0: quiet (CRITICAL, ERROR, WARNING) 1: default (CRITICAL, ERROR, WARNING, INFO, DATA) 2: verbose (CRITICAL, ERROR, WARNING, INFO, DATA, DEBUG) Affects server and all experiments
collect_server_logs_in_experiment_logs
¶
collect_server_logs_in_experiment_logs (Boolean)
Default value False
Whether to collect relevant server logs (h2oai_server.log, dai.log from systemctl or docker, and h2o log) Useful for when sending logs to H2O.ai
migrate_all_entities_to_user
¶
migrate_all_entities_to_user (String)
Default value ''
When set, will migrate all user entities to the defined user upon startup, this is mostly useful during instance migration via H2O’s AIEM/Steam.
per_user_directories
¶
per_user_directories (Boolean)
Default value True
Whether to have all user content isolated into a directory for each user. If set to False, all users content is common to single directory, recipes are shared, and brain folder for restart/refit is shared. If set to True, each user has separate folder for all user tasks, recipes are isolated to each user, and brain folder for restart/refit is only for the specific user. Migration from False to True or back to False is allowed for all experiment content accessible by GUI or python client, all recipes, and starting experiment with same settings, restart, or refit. However, if switch to per-user mode, the common brain folder is no longer used.
data_import_ignore_file_names
¶
data_import_ignore_file_names (List)
Default value ['_SUCCESS']
List of file names to ignore during dataset import. Any files with names listed above will be skipped when DAI creates a dataset. Example, directory contains 3 files: [data_1.csv, data_2.csv, _SUCCESS] DAI will only attempt to create a dataset using files data_1.csv and data_2.csv, and _SUCCESS file will be ignored. Default is to ignore _SUCCESS files which are commonly created in exporting data from Hadoop
data_import_upcast_multi_file
¶
data_import_upcast_multi_file (Boolean)
Default value False
For data import from a directory (multiple files), allow column types to differ and perform upcast during import.
data_import_explode_list_type_columns_in_parquet
¶
data_import_explode_list_type_columns_in_parquet (Boolean)
Default value False
If set to true, will explode columns with list data type when importing parquet files.
files_without_extensions_expected_types
¶
files_without_extensions_expected_types (List)
Default value ['parquet', 'orc']
List of file types that Driverless AI should attempt to import data as IF no file extension exists in the file name If no file extension is provided, Driverless AI will attempt to import the data starting with first type in the defined list. Default [“parquet”, “orc”] Example: ‘test.csv’ (file extension exists) vs ‘test’ (file extension DOES NOT exist)
NOTE: see supported_file_types configuration option for more details on supported file types
do_not_log_list
¶
do_not_log_list (List)
Default value ['cols_to_drop', 'cols_to_drop_sanitized', 'cols_to_group_by', 'cols_to_group_by_sanitized', 'cols_to_force_in', 'cols_to_force_in_sanitized', 'do_not_log_list', 'do_not_store_list', 'pytorch_nlp_pretrained_s3_access_key_id', 'pytorch_nlp_pretrained_s3_secret_access_key', 'auth_openid_end_session_endpoint_url']
do_not_log_list : add configurations that you do not wish to be recorded in logs here.They will still be stored in experiment information so child experiments can behave consistently.
do_not_store_list
¶
do_not_store_list (List)
Default value ['artifacts_git_password', 'auth_jwt_secret', 'auth_openid_client_id', 'auth_openid_client_secret', 'auth_openid_userinfo_auth_key', 'auth_openid_userinfo_auth_value', 'auth_openid_userinfo_username_key', 'auth_tls_ldap_bind_password', 'aws_access_key_id', 'aws_secret_access_key', 'azure_blob_account_key', 'azure_blob_account_name', 'azure_connection_string', 'deployment_aws_access_key_id', 'deployment_aws_secret_access_key', 'gcs_path_to_service_account_json', 'gcs_service_account_json', 'kaggle_key', 'kaggle_username', 'kdb_password', 'kdb_user', 'ldap_bind_password', 'ldap_search_password', 'local_htpasswd_file', 'main_server_minio_access_key_id', 'main_server_minio_secret_access_key', 'main_server_redis_password', 'minio_access_key_id', 'minio_endpoint_url', 'minio_secret_access_key', 'main_server_s3_access_key_id', 'main_server_s3_secret_access_key', 'snowflake_account', 'snowflake_password', 'snowflake_authenticator', 'snowflake_url', 'snowflake_user', 'custom_recipe_security_analysis_enabled', 'custom_recipe_import_allowlist', 'custom_recipe_import_banlist', 'custom_recipe_method_call_allowlist', 'custom_recipe_method_call_banlist', 'custom_recipe_dangerous_patterns', 'azure_ad_client_secret', 'azure_blob_keycloak_aad_client_secret', 'artifacts_azure_blob_account_name', 'artifacts_azure_blob_account_key', 'artifacts_azure_connection_string', 'artifacts_azure_sas_token', 'tensorflow_nlp_pretrained_s3_access_key_id', 'tensorflow_nlp_pretrained_s3_secret_access_key', 'ssl_key_passphrase', 'jdbc_app_configs', 'openai_api_secret_key']
do_not_store_list : add configurations that you do not wish to be stored at all here.Will not be remembered across experiments, so not applicable to data science related itemsthat could be controlled by a user. These items are automatically not logged.
ping_sleep_period
¶
ping_sleep_period (Float)
Default value 0.5
Period between checking DAI status. Should be small enough to avoid slowing parent who stops ping process.
data_precision
¶
data_precision (String)
Default value 'float32'
Precision of how data is stored ‘datatable’ keeps original datatable storage types (i.e. bool, int, float32, float64) (experimental) ‘float32’ best for speed, ‘float64’ best for accuracy or very large input values, “datatable” best for memory ‘float32’ allows numbers up to about +-3E38 with relative error of about 1E-7 ‘float64’ allows numbers up to about +-1E308 with relative error of about 1E-16 Some calculations, like the GLM standardization, can only handle up to sqrt() of these maximums for data values, So GLM with 32-bit precision can only handle up to about a value of 1E19 before standardization generates inf values. If you see “Best individual has invalid score” you may require higher precision.
transformer_precision
¶
transformer_precision (String)
Default value 'float32'
Precision of most data transformers (same options and notes as data_precision). Useful for higher precision in transformers with numerous operations that can accumulate error. Also useful if want faster performance for transformers but otherwise want data stored in high precision.
ulimit_up_to_hard_limit
¶
ulimit_up_to_hard_limit (Boolean)
Default value True
Whether to change ulimit soft limits up to hard limits (for DAI server app, which is not a generic user app). Prevents resource limit problems in some cases. Restricted to no more than limit_nofile and limit_nproc for those resources.
disable_core_files
¶
Whether to disable core files if debug_log=true. If debug_log=false, core file creation is always disabled. (Boolean)
Default value False
limit_nofile
¶
limit_nofile (Number)
Default value 131071
number of file limit Below should be consistent with start-dai.sh
limit_nproc
¶
limit_nproc (Number)
Default value 16384
number of threads limit Below should be consistent with start-dai.sh
produce_correlation_heatmap
¶
produce_correlation_heatmap (Boolean)
Default value False
Whether to dump to disk a correlation heatmap
restart_experiments_after_shutdown
¶
restart_experiments_after_shutdown (Boolean)
Default value False
If True, experiments aborted by server restart will automatically restart and continue upon user login
any_env_overrides
¶
any_env_overrides (Boolean)
Default value False
When environment variable is set to toml value, consider that an override of any toml value. Experiment’s remember toml values for scoring, and this treats any environment set as equivalent to putting OVERRIDE_ in front of the environment key.
debug_print
¶
Enable debug prints to console (Boolean) (Expert Setting)
Default value False
Whether to enable debug prints (to console/stdout/stderr), e.g. showing up in dai*.log or dai*.txt type files.
debug_print_level
¶
Level of debug to print (Number) (Expert Setting)
Default value 0
Level (0-4) for debug prints (to console/stdout/stderr), e.g. showing up in dai*.log or dai*.txt type files. 1-2 is normal, 4 would lead to highly excessive debug and is not recommended in production.
return_quickly_autodl_testing
¶
return_quickly_autodl_testing (Boolean)
Default value False
return_quickly_autodl_testing2
¶
return_quickly_autodl_testing2 (Boolean)
Default value False
return_before_final_model
¶
return_before_final_model (Boolean)
Default value False
main_logger_with_experiment_ids
¶
main_logger_with_experiment_ids (Boolean)
Default value True
final_munging_memory_reduction_factor
¶
Factor to reduce estimated memory usage by (Number) (Expert Setting)
Default value 2
Reduce memory usage during final ensemble feature engineering (1 uses most memory, larger values use less memory)
munging_memory_overhead_factor
¶
Memory use per transformer per input data size (Number) (Expert Setting)
Default value 5
- How much more memory a typical transformer needs than the input data.
Can be increased if, e.g., final model munging uses too much memory due to parallel operations.
per_transformer_segfault_protection_ga
¶
Whether to have per-transformer segfault protection when munging data into transformed features during tuning and evolution. Can lead to significant slowdown for cases when large data but data is sampled, leaving large objects in parent fork, leading to slow fork time for each transformer. (Boolean)
Default value False
per_transformer_segfault_protection_final
¶
Whether to have per-transformer segfault protection when munging data into transformed features during final model fitting and scoring. Can lead to significant slowdown for cases when large data but data is sampled, leaving large objects in parent fork, leading to slow fork time for each transformer. (Boolean)
Default value False
submit_resource_wait_period
¶
submit_resource_wait_period (Number)
Default value 10
How often to check resources (disk, memory, cpu) to see if need to stall submission.
stall_subprocess_submission_cpu_threshold_pct
¶
stall_subprocess_submission_cpu_threshold_pct (Number)
Default value 100
Stall submission of subprocesses if system CPU usage is higher than this threshold in percent (set to 100 to disable). A reasonable number is 90.0 if activated
stall_subprocess_submission_dai_fork_threshold_pct
¶
stall_subprocess_submission_dai_fork_threshold_pct (Float)
Default value -1.0
Restrict/Stall submission of subprocesses if DAI fork count (across all experiments) per unit ulimit nproc soft limit is higher than this threshold in percent (set to -1 to disable, 0 for minimal forking. A reasonable number is 90.0 if activated
stall_subprocess_submission_experiment_fork_threshold_pct
¶
stall_subprocess_submission_experiment_fork_threshold_pct (Float)
Default value -1.0
Restrict/Stall submission of subprocesses if experiment fork count (across all experiments) per unit ulimit nproc soft limit is higher than this threshold in percent (set to -1 to disable, 0 for minimal forking). A reasonable number is 90.0 if activated. For small data leads to overhead of about 0.1s per task submitted due to checks, so for scoring can slow things down for tests.
restrict_initpool_by_memory
¶
restrict_initpool_by_memory (Boolean)
Default value True
Whether to restrict pool workers even if not used, by reducing number of pool workers available. Good if really huge number of experiments, but otherwise, best to have all pool workers ready and only stall submission of tasks so can be dynamic to multi-experiment environment
users_disk_usage_quota
¶
users_disk_usage_quota (Float)
Default value 1.0
A fraction that with valid values between 0.1 and 1.0 that determines the disk usage quota for a user, this quota will be checked during datasets import or experiment runs.
scoring_data_directory
¶
scoring_data_directory (String)
Default value 'tmp'
Path to use for scoring directory path relative to run path
num_models_for_resume_graph
¶
num_models_for_resume_graph (Number)
Default value 1000
mojo_acceptance_test_errors_fatal
¶
mojo_acceptance_test_errors_fatal (Boolean)
Default value True
mojo_acceptance_test_errors_shap_fatal
¶
mojo_acceptance_test_errors_shap_fatal (Boolean)
Default value True
mojo_acceptance_test_orig_shap
¶
mojo_acceptance_test_orig_shap (Boolean)
Default value True
enable_single_instance_db_access
¶
enable_single_instance_db_access (Boolean)
Default value True
If set to true, will make sure only current instance can access its database
enable_pytorch_nlp
¶
enable_pytorch_nlp (String)
Default value 'auto'
Deprecated - maps to enable_pytorch_nlp_transformer and enable_pytorch_nlp_model in 1.10.2+
check_timeout_per_gpu
¶
check_timeout_per_gpu (Number)
Default value 20
How long to wait per GPU for tensorflow/torch to run during system checks.
gpu_exit_if_fails
¶
gpu_exit_if_fails (Boolean)
Default value True
Whether to fail start-up if cannot successfully run GPU checks
how_started
¶
how_started (String)
Default value ''
wizard_state
¶
wizard_state (String)
Default value ''
enable_telemetry
¶
enable_telemetry (Boolean)
Default value False
Whether to enable pushing telemetry events to a configured telemetry receiver in ‘telemetry_plugins_dir’.
telemetry_plugins_dir
¶
telemetry_plugins_dir (String)
Default value './telemetry_plugins'
Directory to scan for telemetry recipes.
h2o_telemetry_tls_enabled
¶
h2o_telemetry_tls_enabled (Boolean)
Default value False
Whether to enable TLS to communicate to H2O.ai Telemetry Service.
h2o_telemetry_rpc_deadline_seconds
¶
h2o_telemetry_rpc_deadline_seconds (Number)
Default value 60
Timeout value when communicating to H2O.ai Telemetry Service.
h2o_telemetry_address
¶
h2o_telemetry_address (String)
Default value ''
H2O.ai Telemetry Service address in H2O.ai Cloud.
h2o_telemetry_service_token_location
¶
h2o_telemetry_service_token_location (String)
Default value ''
H2O.ai Telemetry Service access token file location.
h2o_telemetry_tls_ca_path
¶
h2o_telemetry_tls_ca_path (String)
Default value ''
TLS CA path when communicating to H2O.ai Telemetry Service.
h2o_telemetry_tls_cert_path
¶
h2o_telemetry_tls_cert_path (String)
Default value ''
TLS certificate path when communicating to H2O.ai Telemetry Service.
h2o_telemetry_tls_key_path
¶
h2o_telemetry_tls_key_path (String)
Default value ''
TLS key path when communicating to H2O.ai Telemetry Service.
user_config_directory
¶
user_config_directory (String)
Default value ''
Every *.toml file is read from this directory and process the same way as main config file.
procsy_ip
¶
procsy_ip (String)
Default value '127.0.0.1'
IP address for the procsy process.
procsy_port
¶
procsy_port (Number)
Default value 12347
Port for the procsy process.
procsy_timeout
¶
procsy_timeout (Number)
Default value 3600
Request timeout (in seconds) for the procsy process.
h2o_ip
¶
h2o_ip (String)
Default value '127.0.0.1'
IP address for use by MLI.
h2o_port
¶
h2o_port (Number)
Default value 12348
Port of H2O instance for use by MLI. Each H2O node has an internal port (web port+1, so by default port 12349) for internal node-to-node communication
ip
¶
ip (String)
Default value '127.0.0.1'
IP address and port for Driverless AI HTTP server.
port
¶
port (Number)
Default value 12345
IP address and port for Driverless AI HTTP server.
port_range
¶
port_range (List)
Default value []
A list of two integers indicating the port range to search over, and dynamically find an open port to bind to (e.g., [11111,20000]).
strict_version_check
¶
strict_version_check (Boolean)
Default value True
Strict version check for DAI
max_file_upload_size
¶
max_file_upload_size (Number)
Default value 104857600000
File upload limit (default 100GB)
data_directory
¶
data_directory (String)
Default value './tmp'
- Data directory. All application data and files related datasets and
experiments are stored in this directory.
datasets_directory
¶
datasets_directory (String)
Default value ''
- Datasets directory. If set, it will denote the location from which all
datasets will be read from and written into, typically this location shall be configured to be on an external file system to allow for a more granular control to just the datasets volume. If empty then will default to data_directory.
data_connectors_logs_directory
¶
data_connectors_logs_directory (String)
Default value './tmp'
Path to the directory where the logs of HDFS, Hive, JDBC, and KDB+ data connectors will be saved.
server_logs_sub_directory
¶
server_logs_sub_directory (String)
Default value 'server_logs'
Subdirectory within data_directory to store server logs.
pid_sub_directory
¶
pid_sub_directory (String)
Default value 'pids'
Subdirectory within data_directory to store pid files for controlling kill/stop of DAI servers.
mapr_tickets_directory
¶
mapr_tickets_directory (String)
Default value './tmp/mapr-tickets'
Path to the directory which will be use to save MapR tickets when MapR multi-user mode is enabled. This is applicable only when enable_mapr_multi_user_mode is set to true.
mapr_tickets_duration_minutes
¶
mapr_tickets_duration_minutes (Number)
Default value -1
MapR tickets duration in minutes, if set to -1, it will use the default value (not specified in maprlogin command), otherwise will be the specified configuration value but no less than one day.
remove_uploads_temp_files_server_start
¶
remove_uploads_temp_files_server_start (Boolean)
Default value True
Whether at server start to delete all temporary uploaded files, left over from failed uploads.
remove_temp_files_server_start
¶
remove_temp_files_server_start (Boolean)
Default value False
Whether to run through entire data directory and remove all temporary files. Can lead to slow start-up time if have large number (much greater than 100) of experiments.
remove_temp_files_aborted_experiments
¶
remove_temp_files_aborted_experiments (Boolean)
Default value True
Whether to delete temporary files after experiment is aborted/cancelled.
usage_stats_opt_in
¶
usage_stats_opt_in (Boolean)
Default value True
Whether to opt in to usage statistics and bug reporting
core_site_xml_path
¶
core_site_xml_path (String)
Default value ''
Configurations for a HDFS data source Path of hdfs coresite.xml core_site_xml_path is deprecated, please use hdfs_config_path
hdfs_config_path
¶
hdfs_config_path (String)
Default value ''
(Required) HDFS config folder path. Can contain multiple config files.
key_tab_path
¶
key_tab_path (String)
Default value ''
Path of the principal key tab file. Required when hdfs_auth_type=’principal’. key_tab_path is deprecated, please use hdfs_keytab_path
hdfs_keytab_path
¶
hdfs_keytab_path (String)
Default value ''
Path of the principal key tab file. Required when hdfs_auth_type=’principal’.
preview_cache_upon_server_exit
¶
preview_cache_upon_server_exit (Boolean)
Default value True
Whether to delete preview cache on server exit
enable_health_api
¶
Enable Health API (Boolean)
Default value True
When enabled, server exposes Health API at /apis/health/v1, which provides system overview and utilization statistics
listeners_inherit_env_variables
¶
listeners_inherit_env_variables (Boolean)
Default value False
When enabled, the notification scripts will inherit the parent’s process (DriverlessAI) environment variables.
listeners_experiment_start
¶
listeners_experiment_start (String)
Default value ''
Notification scripts - the variable points to a location of script which is executed at given event in experiment lifecycle - the script should have executable flag enabled - use of absolute path is suggested
The on experiment start notification script location
listeners_experiment_done
¶
listeners_experiment_done (String)
Default value ''
The on experiment finished notification script location
listeners_experiment_import_done
¶
listeners_experiment_import_done (String)
Default value ''
The on experiment import notification script location
listeners_mojo_done
¶
listeners_mojo_done (String)
Default value ''
Notification script triggered when building of MOJO pipeline for experiment is finished. The value should be an absolute path to executable script.
listeners_autodoc_done
¶
listeners_autodoc_done (String)
Default value ''
Notification script triggered when rendering of AutoDoc for experiment is finished. The value should be an absolute path to executable script.
listeners_scoring_pipeline_done
¶
listeners_scoring_pipeline_done (String)
Default value ''
Notification script triggered when building of python scoring pipeline for experiment is finished. The value should be an absolute path to executable script.
listeners_experiment_artifacts_done
¶
listeners_experiment_artifacts_done (String)
Default value ''
Notification script triggered when experiment and all its artifacts selected at the beginning of experiment are finished building. The value should be an absolute path to executable script.
enable_quick_benchmark
¶
enable_quick_benchmark (Boolean)
Default value True
Whether to run quick performance benchmark at start of application
enable_extended_benchmark
¶
enable_extended_benchmark (Boolean)
Default value False
Whether to run extended performance benchmark at start of application
extended_benchmark_scale_num_rows
¶
extended_benchmark_scale_num_rows (Float)
Default value 0.1
Scaling factor for number of rows for extended performance benchmark. For rigorous performance benchmarking, values of 1 or larger are recommended.
extended_benchmark_num_cols
¶
extended_benchmark_num_cols (Number)
Default value 20
Number of columns for extended performance benchmark.
benchmark_memory_timeout
¶
benchmark_memory_timeout (Number)
Default value 2
Seconds to allow for testing memory bandwidth by generating numpy frames
benchmark_memory_vm_fraction
¶
benchmark_memory_vm_fraction (Float)
Default value 0.25
Maximum portion of vm total to use for numpy memory benchmark
benchmark_memory_max_cols
¶
benchmark_memory_max_cols (Number)
Default value 1500
Maximum number of columns to use for numpy memory benchmark
enable_startup_checks
¶
enable_startup_checks (Boolean)
Default value True
Whether to run quick startup checks at start of application
application_id
¶
application_id (String)
Default value ''
Application ID override, which should uniquely identify the instance
main_server_fork_timeout
¶
main_server_fork_timeout (Float)
Default value 10.0
After how many seconds to abort MLI recipe execution plan or recipe compatibility checks. Blocks main server from all activities, so long timeout is not desired, esp. in case of hanging processes, while a short timeout can too often lead to abortions on busy system.
audit_log_retention_period
¶
audit_log_retention_period (Number)
Default value 5
After how many days the audit log records are removed. Set equal to 0 to disable removal of old records.
dataset_tmp_upload_file_retention_time_min
¶
dataset_tmp_upload_file_retention_time_min (Number)
Default value 5
Time to wait after performing a cleanup of temporary files for in-browser dataset upload.