Connectors configuration¶
enabled_file_systems
¶
enabled_file_systems (List)
Default value ['upload', 'file', 'hdfs', 's3', 'recipe_file', 'recipe_url']
File System Support upload : standard upload feature file : local file system/server file system hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below dtap : Blue Data Tap file system, remember to configure the DTap section below s3 : Amazon S3, optionally configure secret and access key below gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below minio : Minio Cloud Storage, remember to configure secret and access key below snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password) kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args) azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key) jdbc: JDBC Connector, remember to configure JDBC below. (jdbc_app_configs) hive: Hive Connector, remember to configure Hive below. (hive_app_configs) recipe_file: Custom recipe file upload recipe_url: Custom recipe upload via url h2o_drive: H2O Drive, remember to configure h2o_drive_endpoint_url below feature_store: Feature Store, remember to configure feature_store_endpoint_url below
max_files_listed
¶
max_files_listed (Number)
Default value 100
file_hide_data_directory
¶
file_hide_data_directory (Boolean)
Default value True
The option disable access to DAI data_directory from file browser
file_path_filtering_enabled
¶
file_path_filtering_enabled (Boolean)
Default value False
Enable usage of path filters
file_path_filter_include
¶
file_path_filter_include (List)
Default value []
List of absolute path prefixes to restrict access to in file system browser. First add the following environment variable to your command line to enable this feature: file_path_filtering_enabled=true This feature can be used in the following ways (using specific path or using logged user’s directory): file_path_filter_include=”[‘/data/stage’]” file_path_filter_include=”[‘/data/stage’,’/data/prod’]” file_path_filter_include=/home/{{DAI_USERNAME}}/ file_path_filter_include=”[‘/home/{{DAI_USERNAME}}/’,’/data/stage’,’/data/prod’]”
hdfs_auth_type
¶
hdfs_auth_type (String)
Default value 'noauth'
(Required) HDFS connector Specify HDFS Auth Type, allowed options are: noauth : (default) No authentication needed principal : Authenticate with HDFS with a principal user (DEPRECTATED - use keytab auth type) keytab : Authenticate with a Key tab (recommended). If running
DAI as a service, then the Kerberos keytab needs to be owned by the DAI user.
keytabimpersonation : Login with impersonation using a keytab
hdfs_app_principal_user
¶
hdfs_app_principal_user (String)
Default value ''
Kerberos app principal user. Required when hdfs_auth_type=’keytab’; recommended otherwise.
hdfs_app_login_user
¶
hdfs_app_login_user (String)
Default value ''
Deprecated - Do Not Use, login user is taken from the user name from login
hdfs_app_jvm_args
¶
hdfs_app_jvm_args (String)
Default value ''
JVM args for HDFS distributions, provide args seperate by space -Djava.security.krb5.conf=<path>/krb5.conf -Dsun.security.krb5.debug=True -Dlog4j.configuration=file:///<path>log4j.properties
hdfs_app_classpath
¶
hdfs_app_classpath (String)
Default value ''
hdfs class path
hdfs_app_supported_schemes
¶
hdfs_app_supported_schemes (List)
Default value ['hdfs://', 'maprfs://', 'swift://']
List of supported DFS schemas. Ex. “[‘hdfs://’, ‘maprfs://’, ‘swift://’]” Supported schemas list is used as an initial check to ensure valid input to connector
hdfs_max_files_listed
¶
hdfs_max_files_listed (Number)
Default value 100
Maximum number of files viewable in connector ui. Set to larger number to view more files
hdfs_init_path
¶
hdfs_init_path (String)
Default value 'hdfs://'
Starting HDFS path displayed in UI HDFS browser
hdfs_upload_init_path
¶
hdfs_upload_init_path (String)
Default value 'hdfs://'
Starting HDFS path for the artifacts upload operations
dtap_auth_type
¶
dtap_auth_type (String)
Default value 'noauth'
Blue Data DTap connector settings are similar to HDFS connector settings.
Specify DTap Auth Type, allowed options are: noauth : No authentication needed principal : Authenticate with DTab with a principal user keytab : Authenticate with a Key tab (recommended). If running
DAI as a service, then the Kerberos keytab needs to be owned by the DAI user.
keytabimpersonation : Login with impersonation using a keytab
NOTE: “hdfs_app_classpath” and “core_site_xml_path” are both required to be set for DTap connector
dtap_config_path
¶
dtap_config_path (String)
Default value ''
Dtap (HDFS) config folder path , can contain multiple config files
dtap_key_tab_path
¶
dtap_key_tab_path (String)
Default value ''
Path of the principal key tab file, dtap_key_tab_path is deprecated. Please use dtap_keytab_path
dtap_keytab_path
¶
dtap_keytab_path (String)
Default value ''
Path of the principal key tab file
dtap_app_principal_user
¶
dtap_app_principal_user (String)
Default value ''
Kerberos app principal user (recommended)
dtap_app_login_user
¶
dtap_app_login_user (String)
Default value ''
Specify the user id of the current user here as user@realm
dtap_app_jvm_args
¶
dtap_app_jvm_args (String)
Default value ''
JVM args for DTap distributions, provide args seperate by space
dtap_app_classpath
¶
dtap_app_classpath (String)
Default value ''
DTap (HDFS) class path. NOTE: set ‘hdfs_app_classpath’ also
dtap_init_path
¶
dtap_init_path (String)
Default value 'dtap://'
Starting DTAP path displayed in UI DTAP browser
aws_access_key_id
¶
AWS Access Key ID (String)
Default value ''
S3 Connector credentials
aws_secret_access_key
¶
AWS Secret Access Key (Any)
Default value ''
S3 Connector credentials
aws_role_arn
¶
aws_role_arn (String)
Default value ''
S3 Connector credentials
aws_default_region
¶
aws_default_region (String)
Default value ''
What region to use when none is specified in the s3 url. Ignored when aws_s3_endpoint_url is set.
aws_s3_endpoint_url
¶
aws_s3_endpoint_url (String)
Default value ''
Sets endpoint URL that will be used to access S3.
aws_use_ec2_role_credentials
¶
aws_use_ec2_role_credentials (Boolean)
Default value False
If set to true S3 Connector will try to to obtain credentials associated with the role attached to the EC2 instance.
s3_init_path
¶
s3_init_path (String)
Default value 's3://'
Starting S3 path displayed in UI S3 browser
s3_skip_cert_verification
¶
s3_skip_cert_verification (Boolean)
Default value False
S3 Connector will skip cert verification if this is set to true, (mostly used for S3-like connectors, e.g. Ceph)
s3_connector_cert_location
¶
s3_connector_cert_location (String)
Default value ''
path/to/cert/bundle.pem - A filename of the CA cert bundle to use for the S3 connector
gcs_path_to_service_account_json
¶
gcs_path_to_service_account_json (String)
Default value ''
- GCS Connector credentials
example (suggested) – ‘/licenses/my_service_account_json.json’
gcs_service_account_json
¶
GCS Connector service account JSON (Dict)
Default value {}
GCS Connector service account credentials in JSON, this configuration takes precedence over gcs_path_to_service_account_json.
gbq_access_impersonated_account
¶
GCS Connector impersonated account (String)
Default value ''
GCS Connector impersonated account
gcs_init_path
¶
gcs_init_path (String)
Default value 'gs://'
Starting GCS path displayed in UI GCS browser
gcs_access_token_scopes
¶
gcs_access_token_scopes (String)
Default value ''
Space-seperated list of OAuth2 scopes for the access token used to authenticate in Google Cloud Storage
gcs_default_project_id
¶
gcs_default_project_id (String)
Default value ''
When google_cloud_use_oauth
is enabled, Google Cloud client cannot automatically infer the default project, thus it must be explicitly specified
gbq_access_token_scopes
¶
gbq_access_token_scopes (String)
Default value ''
Space-seperated list of OAuth2 scopes for the access token used to authenticate in Google BigQuery
google_cloud_use_oauth
¶
google_cloud_use_oauth (Boolean)
Default value False
By default the DriverlessAI Google Cloud Storage and BigQuery connectors are using service account file to retrieve authentication credentials.When enabled, the Storage and BigQuery connectors will use OAuth2 user access tokens to authenticate in Google Cloud instead.
minio_endpoint_url
¶
minio_endpoint_url (String)
Default value ''
Minio Connector credentials
minio_access_key_id
¶
Minio Access Key ID (String)
Default value ''
Minio Connector credentials
minio_skip_cert_verification
¶
minio_skip_cert_verification (Boolean)
Default value False
Minio Connector will skip cert verification if this is set to true
minio_connector_cert_location
¶
minio_connector_cert_location (String)
Default value ''
path/to/cert/bundle.pem - A filename of the CA cert bundle to use for the Minio connector
minio_init_path
¶
minio_init_path (String)
Default value '/'
Starting Minio path displayed in UI Minio browser
h2o_drive_endpoint_url
¶
h2o_drive_endpoint_url (String)
Default value ''
H2O Drive server endpoint URL
h2o_drive_access_token_scopes
¶
h2o_drive_access_token_scopes (String)
Default value ''
Space seperated list of OpenID scopes for the access token used by the H2O Drive connector
h2o_drive_session_duration
¶
h2o_drive_session_duration (Number)
Default value 10800
Maximum duration (in seconds) for a session with the H2O Drive
snowflake_url
¶
snowflake_url (String)
Default value ''
Recommended Provide: url, user, password Optionally Provide: account, user, password Example URL: https://<snowflake_account>.<region>.snowflakecomputing.com
Snowflake Connector credentials
snowflake_user
¶
snowflake_user (String)
Default value ''
Snowflake Connector credentials
snowflake_password
¶
snowflake_password (String)
Default value ''
Snowflake Connector credentials
snowflake_account
¶
snowflake_account (String)
Default value ''
Snowflake Connector credentials
snowflake_authenticator
¶
snowflake_authenticator (String)
Default value ''
Snowflake Connector authenticator, can be used when Snowflake is using native SSO with Okta. E.g.: snowflake_authenticator = “https://<okta_account_name>.okta.com”
snowflake_keycloak_broker_token_endpoint
¶
snowflake_keycloak_broker_token_endpoint (String)
Default value ''
Keycloak endpoint for retrieving external IdP tokens for Snowflake. (https://www.keycloak.org/docs/latest/server_admin/#retrieving-external-idp-tokens)
snowflake_keycloak_broker_token_type
¶
snowflake_keycloak_broker_token_type (String)
Default value 'access_token'
Token type that should be used from the response from Keycloak endpoint for retrieving external IdP tokens for Snowflake. See snowflake_keycloak_broker_token_endpoint.
snowflake_h2o_secure_store_oauth_client_id
¶
snowflake_h2o_secure_store_oauth_client_id (String)
Default value ''
ID of the OAuth client configured in H2O Secure Store for authentication with Snowflake.
snowflake_host
¶
snowflake_host (String)
Default value ''
Snowflake hostname to connect to when running Driverless AI in Snowpark Container Services.
snowflake_port
¶
snowflake_port (String)
Default value ''
Snowflake port to connect to when running Driverless AI in Snowpark Container Services.
snowflake_session_token_filepath
¶
snowflake_session_token_filepath (String)
Default value ''
Snowflake filepath that stores the token of the session, when running Driverless AI in Snowpark Container Services. E.g.: snowflake_session_token_filepath = “/snowflake/session/token”
snowflake_allow_stages
¶
snowflake_allow_stages (Boolean)
Default value True
Setting to allow or disallow Snowflake connector from using Snowflake stages during queries. True - will permit the connector to use stages and generally improves performance. However, if the Snowflake user does not have permission to create/use stages will end in errors. False - will prevent the connector from using stages, thus Snowflake users without permission to create/use stages will have successful queries, however may significantly negatively impact query performance.
snowflake_batch_size
¶
snowflake_batch_size (Number)
Default value 10000
Sets the number of rows to be fetched by Snowflake cursor at one time. This is only used if setting snowflake_allow_stages is set to False, may help with performance depending on the type and size of data being queried.
kdb_user
¶
kdb_user (String)
Default value ''
KDB Connector credentials
kdb_password
¶
kdb_password (String)
Default value ''
KDB Connector credentials
kdb_hostname
¶
kdb_hostname (String)
Default value ''
KDB Connector credentials
kdb_port
¶
kdb_port (String)
Default value ''
KDB Connector credentials
kdb_app_classpath
¶
kdb_app_classpath (String)
Default value ''
KDB Connector credentials
kdb_app_jvm_args
¶
kdb_app_jvm_args (String)
Default value ''
KDB Connector credentials
azure_blob_account_name
¶
Azure Blob Store Account Name (String)
Default value ''
Account name for Azure Blob Store Connector
azure_blob_account_key
¶
Azure Blob Store Account Key (Any)
Default value ''
Account key for Azure Blob Store Connector
azure_connection_string
¶
Azure Blob Store Connection String (Any)
Default value ''
Connection string for Azure Blob Store Connector
azure_sas_token
¶
Azure Blob Store SAS token (Any)
Default value ''
SAS token for Azure Blob Store Connector
azure_blob_init_path
¶
azure_blob_init_path (String)
Default value 'https://'
Starting Azure blob store path displayed in UI Azure blob store browser
azure_blob_use_access_token
¶
azure_blob_use_access_token (Boolean)
Default value False
When enabled, Azure Blob Store Connector will use access token derived from the credentials received on login with OpenID Connect.
azure_blob_use_access_token_scopes
¶
azure_blob_use_access_token_scopes (String)
Default value 'https://storage.azure.com/.default'
Configures the scopes for the access token used by Azure Blob Store Connector when the azure_blob_use_access_token us enabled. (space separated list)
azure_blob_use_access_token_source
¶
azure_blob_use_access_token_source (String)
Default value 'SESSION'
- Sets the source of the access token for accessing the Azure bob store
- KEYCLOAK: Will exchange the session access token for the federated
refresh token with Keycloak and use it to obtain the access token directly with the Azure AD.
- SESSION: Will use the access token derived from the credentials
received on login with OpenID Connect.
azure_blob_keycloak_aad_client_id
¶
azure_blob_keycloak_aad_client_id (String)
Default value ''
Application (client) ID registered on Azure AD when the KEYCLOAK source is enabled.
azure_blob_keycloak_aad_client_secret
¶
azure_blob_keycloak_aad_client_secret (String)
Default value ''
Application (client) secret when the KEYCLOAK source is enabled.
azure_blob_keycloak_aad_auth_uri
¶
azure_blob_keycloak_aad_auth_uri (String)
Default value ''
A URL that identifies a token authority. It should be of the format https://login.microsoftonline.com/your_tenant
azure_blob_keycloak_broker_token_endpoint
¶
azure_blob_keycloak_broker_token_endpoint (String)
Default value ''
Keycloak Endpoint for Retrieving External IDP Tokens (https://www.keycloak.org/docs/latest/server_admin/#retrieving-external-idp-tokens)
azure_enable_token_auth_aad
¶
azure_enable_token_auth_aad (Boolean)
Default value False
- (DEPRECATED, use azure_blob_use_access_token and
azure_blob_use_access_token_source=”KEYCLOAK” instead.)
(When enabled only DEPRECATED options azure_ad_client_id, azure_ad_client_secret, azure_ad_auth_uri and azure_keycloak_idp_token_endpoint will be effective)
- This is equivalent to setting
azure_blob_use_access_token_source = “KEYCLOAK”
and setting azure_blob_keycloak_aad_client_id, azure_blob_keycloak_aad_client_secret, azure_blob_keycloak_aad_auth_uri and azure_blob_keycloak_broker_token_endpoint options. )
If true, enable the Azure Blob Storage Connector to use Azure AD tokens obtained from the Keycloak for auth.
azure_ad_client_id
¶
azure_ad_client_id (String)
Default value ''
(DEPRECATED, use azure_blob_keycloak_aad_client_id instead.) Application (client) ID registered on Azure AD
azure_ad_client_secret
¶
azure_ad_client_secret (String)
Default value ''
(DEPRECATED, use azure_blob_keycloak_aad_client_secret instead.) Application Client Secret
azure_ad_auth_uri
¶
azure_ad_auth_uri (String)
Default value ''
(DEPRECATED, use azure_blob_keycloak_aad_auth_uri instead)A URL that identifies a token authority. It should be of the format https://login.microsoftonline.com/your_tenant
azure_ad_scopes
¶
azure_ad_scopes (List)
Default value []
(DEPRECATED, use azure_blob_use_access_token_scopes instead.)Scopes requested to access a protected API (a resource).
azure_keycloak_idp_token_endpoint
¶
azure_keycloak_idp_token_endpoint (String)
Default value ''
(DEPRECATED, use azure_blob_keycloak_broker_token_endpoint instead.)Keycloak Endpoint for Retrieving External IDP Tokens (https://www.keycloak.org/docs/latest/server_admin/#retrieving-external-idp-tokens)
jdbc_app_configs
¶
jdbc_app_configs (String)
Default value '{}'
Configuration for JDBC Connector. JSON/Dictionary String with multiple keys. Format as a single line without using carriage returns (the following example is formatted for readability). Use triple quotations to ensure that the text is read as a single string. Example: ‘{
- “postgres”: {
“url”: “jdbc:postgresql://ip address:port/postgres”, “jarpath”: “/path/to/postgres_driver.jar”, “classpath”: “org.postgresql.Driver”
}, “mysql”: {
“url”:”mysql connection string”, “jarpath”: “/path/to/mysql_driver.jar”, “classpath”: “my.sql.classpath.Driver”
}
}’
jdbc_app_jvm_args
¶
jdbc_app_jvm_args (String)
Default value '-Xmx4g'
extra jvm args for jdbc connector
jdbc_app_classpath
¶
jdbc_app_classpath (String)
Default value ''
alternative classpath for jdbc connector
hive_app_configs
¶
hive_app_configs (String)
Default value '{}'
Configuration for Hive Connector. Note that inputs are similar to configuring HDFS connectivity. important keys: * hive_conf_path - path to hive configuration, may have multiple files. typically: hive-site.xml, hdfs-site.xml, etc * auth_type - one of noauth, keytab, keytabimpersonation for kerberos authentication * keytab_path - path to the kerberos keytab to use for authentication, can be “” if using noauth auth_type * principal_user - Kerberos app principal user. Required when using auth_type keytab or keytabimpersonation JSON/Dictionary String with multiple keys. Example: ‘{
- “hive_connection_1”: {
“hive_conf_path”: “/path/to/hive/conf”, “auth_type”: “one of [‘noauth’, ‘keytab’, ‘keytabimpersonation’]”, “keytab_path”: “/path/to/<filename>.keytab”, “principal_user”: “hive/localhost@EXAMPLE.COM”,
}, “hive_connection_2”: {
“hive_conf_path”: “/path/to/hive/conf_2”, “auth_type”: “one of [‘noauth’, ‘keytab’, ‘keytabimpersonation’]”, “keytab_path”: “/path/to/<filename_2>.keytab”, “principal_user”: “my_user/localhost@EXAMPLE.COM”,
}
}’
hive_app_jvm_args
¶
hive_app_jvm_args (String)
Default value '-Xmx4g'
Extra jvm args for hive connector
hive_app_classpath
¶
hive_app_classpath (String)
Default value ''
Alternative classpath for hive connector. Can be used to add additional jar files to classpath.
enable_artifacts_upload
¶
enable_artifacts_upload (Boolean)
Default value False
Replace all the downloads on the experiment page to exports and allow users to push to the artifact store configured with artifacts_store
artifacts_store
¶
Artifacts Store (String)
Default value 'file_system'
- Artifacts store.
file_system: stores artifacts on a file system directory denoted by artifacts_file_system_directory. s3: stores artifacts to S3 bucket. bitbucket: stores data into Bitbucket repository. azure: stores data into Azure Blob Store. hdfs: stores data into a Hadoop distributed file system location.
bitbucket_skip_cert_verification
¶
bitbucket_skip_cert_verification (Boolean)
Default value False
Decide whether to skip cert verification for Bitbucket when using a repo with HTTPS
bitbucket_tmp_relative_dir
¶
bitbucket_tmp_relative_dir (String)
Default value 'local_git_tmp'
Local temporary directory to clone artifacts to, relative to data_directory
artifacts_file_system_directory
¶
artifacts_file_system_directory (String)
Default value 'tmp'
File system location where artifacts will be copied in case artifacts_store is set to file_system
artifacts_s3_bucket
¶
AWS S3 Bucket Name (String)
Default value ''
AWS S3 bucket used for experiment artifact export.
artifacts_azure_blob_account_name
¶
Azure Blob Store Account Name (String)
Default value ''
Azure Blob Store credentials used for experiment artifact export
artifacts_azure_blob_account_key
¶
Azure Blob Store Account Key (Any)
Default value ''
Azure Blob Store credentials used for experiment artifact export
artifacts_azure_connection_string
¶
Azure Blob Store Connection String (Any)
Default value ''
Azure Blob Store connection string used for experiment artifact export
artifacts_azure_sas_token
¶
Azure Blob Store SAS token (Any)
Default value ''
Azure Blob Store SAS token used for experiment artifact export
artifacts_git_user
¶
artifacts_git_user (String)
Default value 'git'
Git auth user
artifacts_git_password
¶
artifacts_git_password (String)
Default value ''
Git auth password
artifacts_git_repo
¶
artifacts_git_repo (String)
Default value ''
Git repo where artifacts will be pushed upon and upload
artifacts_git_branch
¶
artifacts_git_branch (String)
Default value 'dev'
Git branch on the remote repo where artifacts are pushed
artifacts_git_ssh_private_key_file_location
¶
artifacts_git_ssh_private_key_file_location (String)
Default value ''
File location for the ssh private key used for git authentication
feature_store_endpoint_url
¶
feature_store_endpoint_url (String)
Default value ''
Feature Store server endpoint URL
feature_store_enable_tls
¶
feature_store_enable_tls (Boolean)
Default value False
Enable TLS communication between DAI and the Feature Store server
feature_store_tls_cert_path
¶
feature_store_tls_cert_path (String)
Default value ''
Path to the client certificate to authenticate with the Feature Store server. This is only effective when feature_store_enable_tls=True.
feature_store_access_token_scopes
¶
feature_store_access_token_scopes (String)
Default value ''
A list of access token scopes used by the Feature Store connector to authenticate. (Space separate list)
feature_store_custom_recipe_location
¶
feature_store_custom_recipe_location (String)
Default value ''
When defined, will be used as an alternative recipe implementation for the FeatureStore connector.
enable_gpt
¶
enable_gpt (Boolean)
Default value False
If enabled, GPT functionalities such as summarization would be available. If openai_api_secret_key config is provided, OpenAI API would be used. Make sure this does not break your internal policy.
openai_api_secret_key
¶
OpenAI API secret key (Any)
Default value ''
OpenAI API secret key. Beware that if this config is set and enable_gpt is true, we will send some metadata about datasets and experiments to OpenAI (during dataset and experiment summarization). Make sure that passing such data to OpenAI does not break your internal policy.
openai_api_model
¶
OpenAI model to use (String)
Default value 'gpt-4'
OpenAI model to use.
h2ogpt_url
¶
h2oGPT URL (String)
Default value ''
h2oGPT URL endpoint that will be used for GPT-related purposes (e.g. summarization). If both h2ogpt_url and openai_api_secret_key are provided, we will use only h2oGPT URL.
h2ogpt_key
¶
h2oGPT key (Any)
Default value ''
The h2oGPT Key required for specific h2oGPT URLs, enabling authorized access for GPT-related tasks like summarization.
h2ogpt_model_name
¶
h2oGPT model name (String)
Default value ''
Name of the h2oGPT model that should be used. If not specified the default model in the h2oGPT will be used.