Connectors configuration
enabled_file_systems
enabled_file_systems (List)
Default value ['upload', 'file', 'hdfs', 's3', 'recipe_file', 'recipe_url']
File System Support upload : standard upload feature file : local file system/server file system hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below dtap : Blue Data Tap file system, remember to configure the DTap section below s3 : Amazon S3, optionally configure secret and access key below gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below minio : Minio Cloud Storage, remember to configure secret and access key below snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password) kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args) azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key) jdbc: JDBC Connector, remember to configure JDBC below. (jdbc_app_configs) hive: Hive Connector, remember to configure Hive below. (hive_app_configs) recipe_file: Custom recipe file upload recipe_url: Custom recipe upload via url h2o_drive: H2O Drive, remember to configure h2o_drive_endpoint_url below feature_store: Feature Store, remember to configure feature_store_endpoint_url below databricks: Databricks Delta Table connector.
max_files_listed
max_files_listed (Number)
Default value 100
file_hide_data_directory
file_hide_data_directory (Boolean)
Default value True
The option disable access to DAI data_directory from file browser
file_path_filtering_enabled
file_path_filtering_enabled (Boolean)
Default value False
Enable usage of path filters
file_path_filter_include
file_path_filter_include (List)
Default value []
List of absolute path prefixes to restrict access to in file system browser. First add the following environment variable to your command line to enable this feature: file_path_filtering_enabled=true This feature can be used in the following ways (using specific path or using logged user’s directory): file_path_filter_include=”[‘/data/stage’]” file_path_filter_include=”[‘/data/stage’,’/data/prod’]” file_path_filter_include=/home/{{DAI_USERNAME}}/ file_path_filter_include=”[‘/home/{{DAI_USERNAME}}/’,’/data/stage’,’/data/prod’]”
hdfs_auth_type
hdfs_auth_type (String)
Default value 'noauth'
(Required) HDFS connector Specify HDFS Auth Type, allowed options are: noauth : (default) No authentication needed principal : Authenticate with HDFS with a principal user (DEPRECTATED - use keytab auth type) keytab : Authenticate with a Key tab (recommended). If running
DAI as a service, then the Kerberos keytab needs to be owned by the DAI user.
keytabimpersonation : Login with impersonation using a keytab
hdfs_app_principal_user
hdfs_app_principal_user (String)
Default value ''
Kerberos app principal user. Required when hdfs_auth_type=’keytab’; recommended otherwise.
hdfs_app_login_user
hdfs_app_login_user (String)
Default value ''
Deprecated - Do Not Use, login user is taken from the user name from login
hdfs_app_jvm_args
hdfs_app_jvm_args (String)
Default value ''
JVM args for HDFS distributions, provide args seperate by space -Djava.security.krb5.conf=<path>/krb5.conf -Dsun.security.krb5.debug=True -Dlog4j.configuration=file:///<path>log4j.properties
hdfs_app_classpath
hdfs_app_classpath (String)
Default value ''
hdfs class path
hdfs_app_supported_schemes
hdfs_app_supported_schemes (List)
Default value ['hdfs://', 'maprfs://', 'swift://']
List of supported DFS schemas. Ex. “[‘hdfs://’, ‘maprfs://’, ‘swift://’]” Supported schemas list is used as an initial check to ensure valid input to connector
hdfs_max_files_listed
hdfs_max_files_listed (Number)
Default value 100
Maximum number of files viewable in connector ui. Set to larger number to view more files
hdfs_init_path
hdfs_init_path (String)
Default value 'hdfs://'
Starting HDFS path displayed in UI HDFS browser
hdfs_upload_init_path
hdfs_upload_init_path (String)
Default value 'hdfs://'
Starting HDFS path for the artifacts upload operations
dtap_auth_type
dtap_auth_type (String)
Default value 'noauth'
Blue Data DTap connector settings are similar to HDFS connector settings.
Specify DTap Auth Type, allowed options are: noauth : No authentication needed principal : Authenticate with DTab with a principal user keytab : Authenticate with a Key tab (recommended). If running
DAI as a service, then the Kerberos keytab needs to be owned by the DAI user.
keytabimpersonation : Login with impersonation using a keytab
NOTE: “hdfs_app_classpath” and “core_site_xml_path” are both required to be set for DTap connector
dtap_config_path
dtap_config_path (String)
Default value ''
Dtap (HDFS) config folder path , can contain multiple config files
dtap_key_tab_path
dtap_key_tab_path (String)
Default value ''
Path of the principal key tab file, dtap_key_tab_path is deprecated. Please use dtap_keytab_path
dtap_keytab_path
dtap_keytab_path (String)
Default value ''
Path of the principal key tab file
dtap_app_principal_user
dtap_app_principal_user (String)
Default value ''
Kerberos app principal user (recommended)
dtap_app_login_user
dtap_app_login_user (String)
Default value ''
Specify the user id of the current user here as user@realm
dtap_app_jvm_args
dtap_app_jvm_args (String)
Default value ''
JVM args for DTap distributions, provide args seperate by space
dtap_app_classpath
dtap_app_classpath (String)
Default value ''
DTap (HDFS) class path. NOTE: set ‘hdfs_app_classpath’ also
dtap_init_path
dtap_init_path (String)
Default value 'dtap://'
Starting DTAP path displayed in UI DTAP browser
aws_access_key_id
AWS Access Key ID (String)
Default value ''
S3 Connector credentials
aws_secret_access_key
AWS Secret Access Key (Any)
Default value ''
S3 Connector credentials
aws_role_arn
aws_role_arn (String)
Default value ''
S3 Connector credentials
aws_default_region
aws_default_region (String)
Default value ''
What region to use when none is specified in the s3 url. Ignored when aws_s3_endpoint_url is set.
aws_s3_endpoint_url
aws_s3_endpoint_url (String)
Default value ''
Sets endpoint URL that will be used to access S3.
aws_use_ec2_role_credentials
aws_use_ec2_role_credentials (Boolean)
Default value False
If set to true S3 Connector will try to to obtain credentials associated with the role attached to the EC2 instance.
s3_init_path
s3_init_path (String)
Default value 's3://'
Starting S3 path displayed in UI S3 browser
s3_skip_cert_verification
s3_skip_cert_verification (Boolean)
Default value False
S3 Connector will skip cert verification if this is set to true, (mostly used for S3-like connectors, e.g. Ceph)
s3_connector_cert_location
s3_connector_cert_location (String)
Default value ''
path/to/cert/bundle.pem - A filename of the CA cert bundle to use for the S3 connector
gcs_path_to_service_account_json
gcs_path_to_service_account_json (String)
Default value ''
- GCS Connector credentials
example (suggested) – ‘/licenses/my_service_account_json.json’
gcs_service_account_json
GCS Connector service account JSON (Dict)
Default value {}
GCS Connector service account credentials in JSON, this configuration takes precedence over gcs_path_to_service_account_json.
gbq_access_impersonated_account
GCS Connector impersonated account (String)
Default value ''
GCS Connector impersonated account
gcs_init_path
gcs_init_path (String)
Default value 'gs://'
Starting GCS path displayed in UI GCS browser
gcs_access_token_scopes
gcs_access_token_scopes (String)
Default value ''
Space-seperated list of OAuth2 scopes for the access token used to authenticate in Google Cloud Storage
gcs_default_project_id
gcs_default_project_id (String)
Default value ''
When google_cloud_use_oauth
is enabled, Google Cloud client cannot automatically infer the default project, thus it must be explicitly specified
gbq_access_token_scopes
gbq_access_token_scopes (String)
Default value ''
Space-seperated list of OAuth2 scopes for the access token used to authenticate in Google BigQuery
google_cloud_use_oauth
google_cloud_use_oauth (Boolean)
Default value False
By default the DriverlessAI Google Cloud Storage and BigQuery connectors are using service account file to retrieve authentication credentials.When enabled, the Storage and BigQuery connectors will use OAuth2 user access tokens to authenticate in Google Cloud instead.
minio_endpoint_url
minio_endpoint_url (String)
Default value ''
Minio Connector credentials
minio_access_key_id
Minio Access Key ID (String)
Default value ''
Minio Connector credentials
minio_skip_cert_verification
minio_skip_cert_verification (Boolean)
Default value False
Minio Connector will skip cert verification if this is set to true
minio_connector_cert_location
minio_connector_cert_location (String)
Default value ''
path/to/cert/bundle.pem - A filename of the CA cert bundle to use for the Minio connector
minio_init_path
minio_init_path (String)
Default value '/'
Starting Minio path displayed in UI Minio browser
h2o_drive_endpoint_url
h2o_drive_endpoint_url (String)
Default value ''
H2O Drive server endpoint URL
h2o_drive_access_token_scopes
h2o_drive_access_token_scopes (String)
Default value ''
Space seperated list of OpenID scopes for the access token used by the H2O Drive connector
h2o_drive_session_duration
h2o_drive_session_duration (Number)
Default value 10800
Maximum duration (in seconds) for a session with the H2O Drive
snowflake_url
snowflake_url (String)
Default value ''
Recommended Provide: url, user, password Optionally Provide: account, user, password Example URL: https://<snowflake_account>.<region>.snowflakecomputing.com
Snowflake Connector credentials
snowflake_user
snowflake_user (String)
Default value ''
Snowflake Connector credentials
snowflake_password
snowflake_password (String)
Default value ''
Snowflake Connector credentials
snowflake_account
snowflake_account (String)
Default value ''
Snowflake Connector credentials
snowflake_authenticator
snowflake_authenticator (String)
Default value ''
Snowflake Connector authenticator, can be used when Snowflake is using native SSO with Okta. E.g.: snowflake_authenticator = “https://<okta_account_name>.okta.com”
snowflake_keycloak_broker_token_endpoint
snowflake_keycloak_broker_token_endpoint (String)
Default value ''
Keycloak endpoint for retrieving external IdP tokens for Snowflake. (https://www.keycloak.org/docs/latest/server_admin/#retrieving-external-idp-tokens)
snowflake_keycloak_broker_token_type
snowflake_keycloak_broker_token_type (String)
Default value 'access_token'
Token type that should be used from the response from Keycloak endpoint for retrieving external IdP tokens for Snowflake. See snowflake_keycloak_broker_token_endpoint.
snowflake_h2o_secure_store_oauth_client_id
snowflake_h2o_secure_store_oauth_client_id (String)
Default value ''
ID of the OAuth client configured in H2O Secure Store for authentication with Snowflake.
snowflake_host
snowflake_host (String)
Default value ''
Snowflake hostname to connect to when running Driverless AI in Snowpark Container Services.
snowflake_port
snowflake_port (String)
Default value ''
Snowflake port to connect to when running Driverless AI in Snowpark Container Services.
snowflake_session_token_filepath
snowflake_session_token_filepath (String)
Default value ''
Snowflake filepath that stores the token of the session, when running Driverless AI in Snowpark Container Services. E.g.: snowflake_session_token_filepath = “/snowflake/session/token”
snowflake_allow_stages
snowflake_allow_stages (Boolean)
Default value True
Setting to allow or disallow Snowflake connector from using Snowflake stages during queries. True - will permit the connector to use stages and generally improves performance. However, if the Snowflake user does not have permission to create/use stages will end in errors. False - will prevent the connector from using stages, thus Snowflake users without permission to create/use stages will have successful queries, however may significantly negatively impact query performance.
snowflake_batch_size
snowflake_batch_size (Number)
Default value 10000
Sets the number of rows to be fetched by Snowflake cursor at one time. This is only used if setting snowflake_allow_stages is set to False, may help with performance depending on the type and size of data being queried.
kdb_user
kdb_user (String)
Default value ''
KDB Connector credentials
kdb_password
kdb_password (String)
Default value ''
KDB Connector credentials
kdb_hostname
kdb_hostname (String)
Default value ''
KDB Connector credentials
kdb_port
kdb_port (String)
Default value ''
KDB Connector credentials
kdb_app_classpath
kdb_app_classpath (String)
Default value ''
KDB Connector credentials
kdb_app_jvm_args
kdb_app_jvm_args (String)
Default value ''
KDB Connector credentials
azure_blob_account_name
Azure Blob Store Account Name (String)
Default value ''
Account name for Azure Blob Store Connector
azure_blob_account_key
Azure Blob Store Account Key (Any)
Default value ''
Account key for Azure Blob Store Connector
azure_connection_string
Azure Blob Store Connection String (Any)
Default value ''
Connection string for Azure Blob Store Connector
azure_sas_token
Azure Blob Store SAS token (Any)
Default value ''
SAS token for Azure Blob Store Connector
azure_blob_init_path
azure_blob_init_path (String)
Default value 'https://'
Starting Azure blob store path displayed in UI Azure blob store browser
azure_blob_use_access_token
azure_blob_use_access_token (Boolean)
Default value False
When enabled, Azure Blob Store Connector will use access token derived from the credentials received on login with OpenID Connect.
azure_blob_use_access_token_scopes
azure_blob_use_access_token_scopes (String)
Default value 'https://storage.azure.com/.default'
Configures the scopes for the access token used by Azure Blob Store Connector when the azure_blob_use_access_token us enabled. (space separated list)
azure_blob_use_access_token_source
azure_blob_use_access_token_source (String)
Default value 'SESSION'
- Sets the source of the access token for accessing the Azure bob store
- KEYCLOAK: Will exchange the session access token for the federated
refresh token with Keycloak and use it to obtain the access token directly with the Azure AD.
- SESSION: Will use the access token derived from the credentials
received on login with OpenID Connect.
azure_blob_keycloak_aad_client_id
azure_blob_keycloak_aad_client_id (String)
Default value ''
Application (client) ID registered on Azure AD when the KEYCLOAK source is enabled.
azure_blob_keycloak_aad_client_secret
azure_blob_keycloak_aad_client_secret (String)
Default value ''
Application (client) secret when the KEYCLOAK source is enabled.
azure_blob_keycloak_aad_auth_uri
azure_blob_keycloak_aad_auth_uri (String)
Default value ''
A URL that identifies a token authority. It should be of the format https://login.microsoftonline.com/your_tenant
azure_blob_keycloak_broker_token_endpoint
azure_blob_keycloak_broker_token_endpoint (String)
Default value ''
Keycloak Endpoint for Retrieving External IDP Tokens (https://www.keycloak.org/docs/latest/server_admin/#retrieving-external-idp-tokens)
azure_enable_token_auth_aad
azure_enable_token_auth_aad (Boolean)
Default value False
- (DEPRECATED, use azure_blob_use_access_token and
azure_blob_use_access_token_source=”KEYCLOAK” instead.)
(When enabled only DEPRECATED options azure_ad_client_id, azure_ad_client_secret, azure_ad_auth_uri and azure_keycloak_idp_token_endpoint will be effective)
- This is equivalent to setting
azure_blob_use_access_token_source = “KEYCLOAK”
and setting azure_blob_keycloak_aad_client_id, azure_blob_keycloak_aad_client_secret, azure_blob_keycloak_aad_auth_uri and azure_blob_keycloak_broker_token_endpoint options. )
If true, enable the Azure Blob Storage Connector to use Azure AD tokens obtained from the Keycloak for auth.
azure_ad_client_id
azure_ad_client_id (String)
Default value ''
(DEPRECATED, use azure_blob_keycloak_aad_client_id instead.) Application (client) ID registered on Azure AD
azure_ad_client_secret
azure_ad_client_secret (String)
Default value ''
(DEPRECATED, use azure_blob_keycloak_aad_client_secret instead.) Application Client Secret
azure_ad_auth_uri
azure_ad_auth_uri (String)
Default value ''
(DEPRECATED, use azure_blob_keycloak_aad_auth_uri instead)A URL that identifies a token authority. It should be of the format https://login.microsoftonline.com/your_tenant
azure_ad_scopes
azure_ad_scopes (List)
Default value []
(DEPRECATED, use azure_blob_use_access_token_scopes instead.)Scopes requested to access a protected API (a resource).
azure_keycloak_idp_token_endpoint
azure_keycloak_idp_token_endpoint (String)
Default value ''
(DEPRECATED, use azure_blob_keycloak_broker_token_endpoint instead.)Keycloak Endpoint for Retrieving External IDP Tokens (https://www.keycloak.org/docs/latest/server_admin/#retrieving-external-idp-tokens)
azure_workload_identity_tenant_id
azure_workload_identity_tenant_id (String)
Default value ''
ID of the application’s Microsoft Entra tenant, also called its ‘directory’ ID. This is used for Azure Workload Identity.
azure_workload_identity_client_id
azure_workload_identity_client_id (String)
Default value ''
The client ID of a Microsoft Entra app registration. This is used for Azure Workload Identity.
azure_workload_identity_token_file_path
azure_workload_identity_token_file_path (String)
Default value ''
The path to a file containing a Kubernetes service account token that authenticates the identity. This is used for Azure Workload Identity.
databricks_azure_workload_identity_scopes
databricks_azure_workload_identity_scopes (String)
Default value ''
Desired scopes for the access token when the Databricks connector is using Azure Workflow Identity authentication. At least one scope should be specified. For more information about scopes, see https://learn.microsoft.com/entra/identity-platform/scopes-oidc.
databricks_workspace_instance_name
databricks_workspace_instance_name (String)
Default value ''
Name of the Databricks workspace instance. Please refer https://learn.microsoft.com/en-us/azure/databricks/workspace/workspace-details on how to obtains the name of your Databricks workspace instance.
jdbc_app_configs
jdbc_app_configs (String)
Default value '{}'
Configuration for JDBC Connector. JSON/Dictionary String with multiple keys. Format as a single line without using carriage returns (the following example is formatted for readability). Use triple quotations to ensure that the text is read as a single string. Example: ‘{
- “postgres”: {
“url”: “jdbc:postgresql://ip address:port/postgres”, “jarpath”: “/path/to/postgres_driver.jar”, “classpath”: “org.postgresql.Driver”
}, “mysql”: {
“url”:”mysql connection string”, “jarpath”: “/path/to/mysql_driver.jar”, “classpath”: “my.sql.classpath.Driver”
}
}’
jdbc_app_jvm_args
jdbc_app_jvm_args (String)
Default value '-Xmx4g'
extra jvm args for jdbc connector
jdbc_app_classpath
jdbc_app_classpath (String)
Default value ''
alternative classpath for jdbc connector
hive_app_configs
hive_app_configs (String)
Default value '{}'
Configuration for Hive Connector. Note that inputs are similar to configuring HDFS connectivity. important keys: * hive_conf_path - path to hive configuration, may have multiple files. typically: hive-site.xml, hdfs-site.xml, etc * auth_type - one of noauth, keytab, keytabimpersonation for kerberos authentication * keytab_path - path to the kerberos keytab to use for authentication, can be “” if using noauth auth_type * principal_user - Kerberos app principal user. Required when using auth_type keytab or keytabimpersonation JSON/Dictionary String with multiple keys. Example: ‘{
- “hive_connection_1”: {
“hive_conf_path”: “/path/to/hive/conf”, “auth_type”: “one of [‘noauth’, ‘keytab’, ‘keytabimpersonation’]”, “keytab_path”: “/path/to/<filename>.keytab”, “principal_user”: “hive/localhost@EXAMPLE.COM”,
}, “hive_connection_2”: {
“hive_conf_path”: “/path/to/hive/conf_2”, “auth_type”: “one of [‘noauth’, ‘keytab’, ‘keytabimpersonation’]”, “keytab_path”: “/path/to/<filename_2>.keytab”, “principal_user”: “my_user/localhost@EXAMPLE.COM”,
}
}’
hive_app_jvm_args
hive_app_jvm_args (String)
Default value '-Xmx4g'
Extra jvm args for hive connector
hive_app_classpath
hive_app_classpath (String)
Default value ''
Alternative classpath for hive connector. Can be used to add additional jar files to classpath.
enable_artifacts_upload
enable_artifacts_upload (Boolean)
Default value False
Replace all the downloads on the experiment page to exports and allow users to push to the artifact store configured with artifacts_store
artifacts_store
Artifacts Store (String)
Default value 'file_system'
- Artifacts store.
file_system: stores artifacts on a file system directory denoted by artifacts_file_system_directory. s3: stores artifacts to S3 bucket. bitbucket: stores data into Bitbucket repository. azure: stores data into Azure Blob Store. hdfs: stores data into a Hadoop distributed file system location.
bitbucket_skip_cert_verification
bitbucket_skip_cert_verification (Boolean)
Default value False
Decide whether to skip cert verification for Bitbucket when using a repo with HTTPS
bitbucket_tmp_relative_dir
bitbucket_tmp_relative_dir (String)
Default value 'local_git_tmp'
Local temporary directory to clone artifacts to, relative to data_directory
artifacts_file_system_directory
artifacts_file_system_directory (String)
Default value 'tmp'
File system location where artifacts will be copied in case artifacts_store is set to file_system
artifacts_s3_bucket
AWS S3 Bucket Name (String)
Default value ''
AWS S3 bucket used for experiment artifact export.
artifacts_azure_blob_account_name
Azure Blob Store Account Name (String)
Default value ''
Azure Blob Store credentials used for experiment artifact export
artifacts_azure_blob_account_key
Azure Blob Store Account Key (Any)
Default value ''
Azure Blob Store credentials used for experiment artifact export
artifacts_azure_connection_string
Azure Blob Store Connection String (Any)
Default value ''
Azure Blob Store connection string used for experiment artifact export
artifacts_azure_sas_token
Azure Blob Store SAS token (Any)
Default value ''
Azure Blob Store SAS token used for experiment artifact export
artifacts_git_user
artifacts_git_user (String)
Default value 'git'
Git auth user
artifacts_git_password
artifacts_git_password (String)
Default value ''
Git auth password
artifacts_git_repo
artifacts_git_repo (String)
Default value ''
Git repo where artifacts will be pushed upon and upload
artifacts_git_branch
artifacts_git_branch (String)
Default value 'dev'
Git branch on the remote repo where artifacts are pushed
artifacts_git_ssh_private_key_file_location
artifacts_git_ssh_private_key_file_location (String)
Default value ''
File location for the ssh private key used for git authentication
feature_store_endpoint_url
feature_store_endpoint_url (String)
Default value ''
Feature Store server endpoint URL
feature_store_enable_tls
feature_store_enable_tls (Boolean)
Default value False
Enable TLS communication between DAI and the Feature Store server
feature_store_tls_cert_path
feature_store_tls_cert_path (String)
Default value ''
Path to the client certificate to authenticate with the Feature Store server. This is only effective when feature_store_enable_tls=True.
feature_store_access_token_scopes
feature_store_access_token_scopes (String)
Default value ''
A list of access token scopes used by the Feature Store connector to authenticate. (Space separate list)
feature_store_custom_recipe_location
feature_store_custom_recipe_location (String)
Default value ''
When defined, will be used as an alternative recipe implementation for the FeatureStore connector.
enable_gpt
enable_gpt (Boolean)
Default value False
If enabled, GPT functionalities such as summarization would be available. If openai_api_secret_key config is provided, OpenAI API would be used. Make sure this does not break your internal policy.
openai_api_secret_key
OpenAI API secret key (Any)
Default value ''
OpenAI API secret key. Beware that if this config is set and enable_gpt is true, we will send some metadata about datasets and experiments to OpenAI (during dataset and experiment summarization). Make sure that passing such data to OpenAI does not break your internal policy.
openai_api_model
OpenAI model to use (String)
Default value 'gpt-4'
OpenAI model to use.
h2ogpt_url
h2oGPT URL (String)
Default value ''
h2oGPT URL endpoint that will be used for GPT-related purposes (e.g. summarization). If both h2ogpt_url and openai_api_secret_key are provided, we will use only h2oGPT URL.
h2ogpt_key
h2oGPT key (Any)
Default value ''
The h2oGPT Key required for specific h2oGPT URLs, enabling authorized access for GPT-related tasks like summarization.
h2ogpt_model_name
h2oGPT model name (String)
Default value ''
Name of the h2oGPT model that should be used. If not specified the default model in the h2oGPT will be used.