BlueData DataTap Setup

This section provides instructions for configuring Driverless AI to work with BlueData DataTap.

Description of Configuration Attributes

  • dtap_auth_type: Selects DTAP authentication. Available values are:

    • noauth: No authentication needed

    • principal: Authenticate with DataTap with a principal user

    • keytab: Authenticate with a Key tab (recommended). If running Driverless AI as a service, then the Kerberos keytab needs to be owned by the Driverless AI user.

    • keytabimpersonation: Login with impersonation using a keytab

  • dtap_config_path: The location of the DTAP (HDFS) config folder path. This folder can contain multiple config files. Note: The DTAP config file core-site.xml needs to contain DTap FS configuration, for example:

    <configuration>
      <property>
        <name>fs.dtap.impl</name>
        <value>com.bluedata.hadoop.bdfs.Bdfs</value>
        <description>The FileSystem for BlueData dtap: URIs.</description>
      </property>
    </configuration>
    
  • dtap_key_tab_path: The path of the principal key tab file. For use when dtap_auth_type=principal.

  • dtap_app_principal_user: The Kerberos app principal user (recommended).

  • dtap_app_login_user: The user ID of the current user (for example, user@realm).

  • dtap_app_jvm_args: JVM args for DTap distributions. Separate each argument with spaces.

  • dtap_app_classpath: The DTap classpath.

  • dtap_init_path: Specifies the starting DTAP path displayed in the UI of the DTAP browser.

  • enabled_file_systems: The file systems you want to enable. This must be configured in order for data connectors to function properly.

Start Driverless AI

This section describes how to enable the BlueData DataTap data connector when starting Driverless AI in Docker. This can done by specifying each environment variable in the nvidia-docker run command or by editing the configuration options in the config.toml file and then specifying that file in the nvidia-docker run command.

Enable DataTap with No Authentication

This example enables the DataTap data connector and disables authentication. It does not pass any configuration file; however it configures Docker DNS by passing the name and IP of the DTap name node. This allows users to reference data stored in DTap directly using the name node address, for example: dtap://name.node/datasets/iris.csv or dtap://name.node/datasets/. (Note: The trailing slash is currently required for directories.) Replace TAG below with the image tag.

nvidia-docker run \
  --pid=host \
  --init \
  --rm \
  --shm-size=256m \
  --add-host name.node:172.16.2.186 \
  -e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS="file,dtap" \
  -e DRIVERLESS_AI_DTAP_AUTH_TYPE='noauth'  \
  -p 12345:12345 \
  -v /etc/passwd:/etc/passwd \
  -v /tmp/dtmp/:/tmp \
  -v /tmp/dlog/:/log \
  -v /tmp/dlicense/:/license \
  -v /tmp/ddata/:/data \
  -u $(id -u):$(id -g) \
  h2oai/dai-centos7-x86_64:TAG

Enable DataTap with Keytab-Based Authentication

Notes:

  • If using Kerberos Authentication, the time on the Driverless AI server must be in sync with Kerberos server. If the time difference between clients and DCs are 5 minutes or higher, there will be Kerberos failures.

  • If running Driverless AI as a service, then the Kerberos keytab needs to be owned by the Driverless AI user; otherwise Driverless AI will not be able to read/access the Keytab and will result in a fallback to simple authentication and, hence, fail.

This example:

  • Places keytabs in the /tmp/dtmp folder on your machine and provides the file path as described below.

  • Configures the environment variable DRIVERLESS_AI_DTAP_APP_PRINCIPAL_USER to reference a user for whom the keytab was created (usually in the form of user@realm).

Replace TAG below with the image tag.

# Docker instructions
nvidia-docker run \
    --pid=host \
    --init \
    --rm \
    --shm-size=256m \
    -e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS="file,dtap" \
    -e DRIVERLESS_AI_DTAP_AUTH_TYPE='keytab'  \
    -e DRIVERLESS_AI_DTAP_KEY_TAB_PATH='tmp/<<keytabname>>' \
    -e DRIVERLESS_AI_DTAP_APP_PRINCIPAL_USER='<<user@kerberosrealm>>' \
    -p 12345:12345 \
    -v /etc/passwd:/etc/passwd \
    -v /tmp/dtmp/:/tmp \
    -v /tmp/dlog/:/log \
    -v /tmp/dlicense/:/license \
    -v /tmp/ddata/:/data \
    -u $(id -u):$(id -g) \
    h2oai/dai-centos7-x86_64:TAG

Enable DataTap with Keytab-Based Impersonation

Notes:

  • If using Kerberos, be sure that the Driverless AI time is synched with the Kerberos server.

  • If running Driverless AI as a service, then the Kerberos keytab needs to be owned by the Driverless AI user.

The example:

  • Places keytabs in the /tmp/dtmp folder on your machine and provides the file path as described below.

  • Configures the DRIVERLESS_AI_DTAP_APP_PRINCIPAL_USER variable, which references a user for whom the keytab was created (usually in the form of user@realm).

  • Configures the DRIVERLESS_AI_DTAP_APP_LOGIN_USER variable, which references a user who is being impersonated (usually in the form of user@realm).

Replace TAG below with the image tag.

# Docker instructions
nvidia-docker run \
    --pid=host \
    --init \
    --rm \
    --shm-size=256m \
    -e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS="file,dtap" \
    -e DRIVERLESS_AI_DTAP_AUTH_TYPE='Keytab'  \
    -e DRIVERLESS_AI_DTAP_KEY_TAB_PATH='tmp/<<keytabname>>' \
    -e DRIVERLESS_AI_DTAP_APP_PRINCIPAL_USER='<<appuser@kerberosrealm>>' \
    -e DRIVERLESS_AI_DTAP_APP_LOGIN_USER='<<thisuser@kerberosrealm>>' \
    -p 12345:12345 \
    -v /etc/passwd:/etc/passwd \
    -v /tmp/dtmp/:/tmp \
    -v /tmp/dlog/:/log \
    -v /tmp/dlicense/:/license \
    -v /tmp/ddata/:/data \
    -u $(id -u):$(id -g) \
    h2oai/dai-centos7-x86_64:TAG

Start DAI by Updating the config.toml File

This example shows how to configure DataTap options in the config.toml file, and then specify that file when starting Driverless AI in Docker. Note that this example enables DataTap with no authentication.

  1. Configure the Driverless AI config.toml file. Set the following configuration options:

  • enabled_file_systems = "file, upload, dtap"

  1. Mount the config.toml file into the Docker container.

nvidia-docker run \
  --pid=host \
  --init \
  --rm \
  --shm-size=256m \
  --add-host name.node:172.16.2.186 \
  -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \
  -p 12345:12345 \
  -v /local/path/to/config.toml:/path/in/docker/config.toml \
  -v /etc/passwd:/etc/passwd:ro \
  -v /etc/group:/etc/group:ro \
  -v /tmp/dtmp/:/tmp \
  -v /tmp/dlog/:/log \
  -v /tmp/dlicense/:/license \
  -v /tmp/ddata/:/data \
  -u $(id -u):$(id -g) \
  h2oai/dai-centos7-x86_64:TAG