BlueData DataTap Setup¶
This section provides instructions for configuring Driverless AI to work with BlueData DataTap.
Note: Depending on your Docker install version, use either the docker run --runtime=nvidia
(>= Docker 19.03) or nvidia-docker
(< Docker 19.03) command when starting the Driverless AI Docker image. Use docker version
to check which version of Docker you are using.
Description of Configuration Attributes¶
dtap_auth_type
: Selects DTAP authentication. Available values are:noauth
: No authentication neededprincipal
: Authenticate with DataTap with a principal userkeytab
: Authenticate with a Key tab (recommended). If running Driverless AI as a service, then the Kerberos keytab needs to be owned by the Driverless AI user.keytabimpersonation
: Login with impersonation using a keytab
dtap_config_path
: The location of the DTAP (HDFS) config folder path. This folder can contain multiple config files. Note: The DTAP config file core-site.xml needs to contain DTap FS configuration, for example:<configuration> <property> <name>fs.dtap.impl</name> <value>com.bluedata.hadoop.bdfs.Bdfs</value> <description>The FileSystem for BlueData dtap: URIs.</description> </property> </configuration>
dtap_key_tab_path
: The path of the principal key tab file. For use whendtap_auth_type=principal
.dtap_app_principal_user
: The Kerberos app principal user (recommended).dtap_app_login_user
: The user ID of the current user (for example, user@realm).dtap_app_jvm_args
: JVM args for DTap distributions. Separate each argument with spaces.dtap_app_classpath
: The DTap classpath.dtap_init_path
: Specifies the starting DTAP path displayed in the UI of the DTAP browser.enabled_file_systems
: The file systems you want to enable. This must be configured in order for data connectors to function properly.
Example 1: Enable DataTap with No Authentication¶
This example enables the DataTap data connector and disables authentication. It does not pass any configuration file; however it configures Docker DNS by passing the name and IP of the DTap name node. This lets users reference data stored in DTap directly using the name node address, for example: dtap://name.node/datasets/iris.csv
or dtap://name.node/datasets/
. (Note: The trailing slash is currently required for directories.)
nvidia-docker run \
--pid=host \
--init \
--rm \
--shm-size=2g --cap-add=SYS_NICE --ulimit nofile=131071:131071 --ulimit nproc=16384:16384 \
--add-host name.node:172.16.2.186 \
-e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS="file,dtap" \
-e DRIVERLESS_AI_DTAP_AUTH_TYPE='noauth' \
-p 12345:12345 \
-v /etc/passwd:/etc/passwd \
-v /tmp/dtmp/:/tmp \
-v /tmp/dlog/:/log \
-v /tmp/dlicense/:/license \
-v /tmp/ddata/:/data \
-u $(id -u):$(id -g) \
h2oai/dai-ubi8-x86_64:1.11.0-cuda11.8.0.xx
This example shows how to configure DataTap options in the config.toml file, and then specify that file when starting Driverless AI in Docker. Note that this example enables DataTap with no authentication.
Configure the Driverless AI config.toml file. Set the following configuration options:
enabled_file_systems = "file, upload, dtap"
Mount the config.toml file into the Docker container.
nvidia-docker run \ --pid=host \ --init \ --rm \ --shm-size=2g --cap-add=SYS_NICE --ulimit nofile=131071:131071 --ulimit nproc=16384:16384 \ --add-host name.node:172.16.2.186 \ -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \ -p 12345:12345 \ -v /local/path/to/config.toml:/path/in/docker/config.toml \ -v /etc/passwd:/etc/passwd:ro \ -v /etc/group:/etc/group:ro \ -v /tmp/dtmp/:/tmp \ -v /tmp/dlog/:/log \ -v /tmp/dlicense/:/license \ -v /tmp/ddata/:/data \ -u $(id -u):$(id -g) \ h2oai/dai-ubi8-x86_64:1.11.0-cuda11.8.0.xx
This example enables the DataTap data connector and disables authentication in the config.toml file. This allows users to reference data stored in DataTap directly using the name node address, for example: dtap://name.node/datasets/iris.csv
or dtap://name.node/datasets/
. (Note: The trailing slash is currently required for directories.)
Export the Driverless AI config.toml file or add it to ~/.bashrc. For example:
# DEB and RPM export DRIVERLESS_AI_CONFIG_FILE="/etc/dai/config.toml" # TAR SH export DRIVERLESS_AI_CONFIG_FILE="/path/to/your/unpacked/dai/directory/config.toml"
Specify the following configuration options in the config.toml file.
# File System Support # upload : standard upload feature # dtap : Blue Data Tap file system, remember to configure the DTap section below enabled_file_systems = "file, dtap"
Save the changes when you are done, then stop/restart Driverless AI.
Example 2: Enable DataTap with Keytab-Based Authentication¶
Notes:
If using Kerberos Authentication, the the time on the Driverless AI server must be in sync with Kerberos server. If the time difference between clients and DCs are 5 minutes or higher, there will be Kerberos failures.
If running Driverless AI as a service, then the Kerberos keytab needs to be owned by the Driverless AI user; otherwise Driverless AI will not be able to read/access the Keytab and will result in a fallback to simple authentication and, hence, fail.
This example:
Places keytabs in the
/tmp/dtmp
folder on your machine and provides the file path as described below.Configures the environment variable
DRIVERLESS_AI_DTAP_APP_PRINCIPAL_USER
to reference a user for whom the keytab was created (usually in the form of user@realm).
nvidia-docker run \
--pid=host \
--init \
--rm \
--shm-size=2g --cap-add=SYS_NICE --ulimit nofile=131071:131071 --ulimit nproc=16384:16384 \
-e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS="file,dtap" \
-e DRIVERLESS_AI_DTAP_AUTH_TYPE='keytab' \
-e DRIVERLESS_AI_DTAP_KEY_TAB_PATH='tmp/<<keytabname>>' \
-e DRIVERLESS_AI_DTAP_APP_PRINCIPAL_USER='<<user@kerberosrealm>>' \
-p 12345:12345 \
-v /etc/passwd:/etc/passwd \
-v /tmp/dtmp/:/tmp \
-v /tmp/dlog/:/log \
-v /tmp/dlicense/:/license \
-v /tmp/ddata/:/data \
-u $(id -u):$(id -g) \
h2oai/dai-ubi8-x86_64:1.11.0-cuda11.8.0.xx
This example:
Places keytabs in the
/tmp/dtmp
folder on your machine and provides the file path as described below.Configures the option
dtap_app_prinicpal_user
to reference a user for whom the keytab was created (usually in the form of user@realm).
Configure the Driverless AI config.toml file. Set the following configuration options:
enabled_file_systems = "file, upload, dtap"
dtap_auth_type = "keytab"
dtap_key_tab_path = "/tmp/<keytabname>"
dtap_app_principal_user = "<user@kerberosrealm>"
Mount the config.toml file into the Docker container.
nvidia-docker run \ --pid=host \ --init \ --rm \ --shm-size=2g --cap-add=SYS_NICE --ulimit nofile=131071:131071 --ulimit nproc=16384:16384 \ --add-host name.node:172.16.2.186 \ -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \ -p 12345:12345 \ -v /local/path/to/config.toml:/path/in/docker/config.toml \ -v /etc/passwd:/etc/passwd:ro \ -v /etc/group:/etc/group:ro \ -v /tmp/dtmp/:/tmp \ -v /tmp/dlog/:/log \ -v /tmp/dlicense/:/license \ -v /tmp/ddata/:/data \ -u $(id -u):$(id -g) \ h2oai/dai-ubi8-x86_64:1.11.0-cuda11.8.0.xx
This example:
Places keytabs in the
/tmp/dtmp
folder on your machine and provides the file path as described below.Configures the option
dtap_app_prinicpal_user
to reference a user for whom the keytab was created (usually in the form of user@realm).
Export the Driverless AI config.toml file or add it to ~/.bashrc. For example:
# DEB and RPM export DRIVERLESS_AI_CONFIG_FILE="/etc/dai/config.toml" # TAR SH export DRIVERLESS_AI_CONFIG_FILE="/path/to/your/unpacked/dai/directory/config.toml"
Specify the following configuration options in the config.toml file.
# File System Support # file : local file system/server file system # dtap : Blue Data Tap file system, remember to configure the DTap section below enabled_file_systems = "file, dtap" # Blue Data DTap connector settings are similar to HDFS connector settings. # # Specify DTap Auth Type, allowed options are: # noauth : No authentication needed # principal : Authenticate with DTab with a principal user # keytab : Authenticate with a Key tab (recommended). If running # DAI as a service, then the Kerberos keytab needs to # be owned by the DAI user. # keytabimpersonation : Login with impersonation using a keytab dtap_auth_type = "keytab" # Path of the principal key tab file dtap_key_tab_path = "/tmp/<keytabname>" # Kerberos app principal user (recommended) dtap_app_principal_user = "<user@kerberosrealm>"
Save the changes when you are done, then stop/restart Driverless AI.
Example 3: Enable DataTap with Keytab-Based Impersonation¶
Notes:
If using Kerberos, be sure that the Driverless AI time is synched with the Kerberos server.
If running Driverless AI as a service, then the Kerberos keytab needs to be owned by the Driverless AI user.
This example:
Places keytabs in the
/tmp/dtmp
folder on your machine and provides the file path as described below.Configures the
DRIVERLESS_AI_DTAP_APP_PRINCIPAL_USER
variable, which references a user for whom the keytab was created (usually in the form of user@realm).Configures the
DRIVERLESS_AI_DTAP_APP_LOGIN_USER
variable, which references a user who is being impersonated (usually in the form of user@realm).
# Docker instructions
nvidia-docker run \
--pid=host \
--init \
--rm \
--shm-size=2g --cap-add=SYS_NICE --ulimit nofile=131071:131071 --ulimit nproc=16384:16384 \
-e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS="file,dtap" \
-e DRIVERLESS_AI_DTAP_AUTH_TYPE='keytabimpersonation' \
-e DRIVERLESS_AI_DTAP_KEY_TAB_PATH='tmp/<<keytabname>>' \
-e DRIVERLESS_AI_DTAP_APP_PRINCIPAL_USER='<<appuser@kerberosrealm>>' \
-e DRIVERLESS_AI_DTAP_APP_LOGIN_USER='<<thisuser@kerberosrealm>>' \
-p 12345:12345 \
-v /etc/passwd:/etc/passwd \
-v /tmp/dtmp/:/tmp \
-v /tmp/dlog/:/log \
-v /tmp/dlicense/:/license \
-v /tmp/ddata/:/data \
-u $(id -u):$(id -g) \
h2oai/dai-ubi8-x86_64:1.11.0-cuda11.8.0.xx
This example:
Places keytabs in the
/tmp/dtmp
folder on your machine and provides the file path as described below.Configures the
dtap_app_principal_user
variable, which references a user for whom the keytab was created (usually in the form of user@realm).Configures the
dtap_app_login_user
variable, which references a user who is being impersonated (usually in the form of user@realm).
Configure the Driverless AI config.toml file. Set the following configuration options:
enabled_file_systems = "file, upload, dtap"
dtap_auth_type = "keytabimpersonation"
dtap_key_tab_path = "/tmp/<keytabname>"
dtap_app_principal_user = "<user@kerberosrealm>"
dtap_app_login_user = "<user@realm>"
Mount the config.toml file into the Docker container.
nvidia-docker run \ --pid=host \ --init \ --rm \ --shm-size=2g --cap-add=SYS_NICE --ulimit nofile=131071:131071 --ulimit nproc=16384:16384 \ --add-host name.node:172.16.2.186 \ -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \ -p 12345:12345 \ -v /local/path/to/config.toml:/path/in/docker/config.toml \ -v /etc/passwd:/etc/passwd:ro \ -v /etc/group:/etc/group:ro \ -v /tmp/dtmp/:/tmp \ -v /tmp/dlog/:/log \ -v /tmp/dlicense/:/license \ -v /tmp/ddata/:/data \ -u $(id -u):$(id -g) \ h2oai/dai-ubi8-x86_64:1.11.0-cuda11.8.0.xx
This example:
Places keytabs in the
/tmp/dtmp
folder on your machine and provides the file path as described below.Configures the
dtap_app_principal_user
variable, which references a user for whom the keytab was created (usually in the form of user@realm).Configures the
dtap_app_login_user
variable, which references a user who is being impersonated (usually in the form of user@realm).
Export the Driverless AI config.toml file or add it to ~/.bashrc. For example:
# DEB and RPM export DRIVERLESS_AI_CONFIG_FILE="/etc/dai/config.toml" # TAR SH export DRIVERLESS_AI_CONFIG_FILE="/path/to/your/unpacked/dai/directory/config.toml"
Specify the following configuration options in the config.toml file.
# File System Support # upload : standard upload feature # file : local file system/server file system # hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below # dtap : Blue Data Tap file system, remember to configure the DTap section below # s3 : Amazon S3, optionally configure secret and access key below # gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below # gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below # minio : Minio Cloud Storage, remember to configure secret and access key below # snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password) # kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args) # azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key) # jdbc: JDBC Connector, remember to configure JDBC below. (jdbc_app_configs) # hive: Hive Connector, remember to configure Hive below. (hive_app_configs) # recipe_url: load custom recipe from URL # recipe_file: load custom recipe from local file system enabled_file_systems = "file, dtap" # Blue Data DTap connector settings are similar to HDFS connector settings. # # Specify DTap Auth Type, allowed options are: # noauth : No authentication needed # principal : Authenticate with DTab with a principal user # keytab : Authenticate with a Key tab (recommended). If running # DAI as a service, then the Kerberos keytab needs to # be owned by the DAI user. # keytabimpersonation : Login with impersonation using a keytab dtap_auth_type = "keytabimpersonation" # Path of the principal key tab file dtap_key_tab_path = "/tmp/<keytabname>" # Kerberos app principal user (recommended) dtap_app_principal_user = "<user@kerberosrealm>" # Specify the user id of the current user here as user@realm dtap_app_login_user = "<user@realm>"
Save the changes when you are done, then stop/restart Driverless AI.