S3 Setup

Driverless AI allows you to explore S3 data sources from within the Driverless AI application. This section provides instructions for configuring Driverless AI to work with S3.

Description of Configuration Attributes

  • aws_access_key_id: The S3 access key ID
  • aws_secret_access_key: The S3 access key
  • aws_role_arn: The Amazon Resource Name
  • aws_default_region: The region to use when the aws_s3_endpoint_url option is not set. This is ignored when aws_s3_endpoint_url is set.
  • aws_s3_endpoint_url: The endpoint URL that will be used to access S3.
  • aws_use_ec2_role_credentials: If set to true, the S3 Connector will try to to obtain credentials associated with the role attached to the EC2 instance.
  • s3_init_path: The starting S3 path that will be displayed in UI S3 browser.

Start Driverless AI

The following sections describes how to enable the S3 data connector when starting Driverless AI in Docker. This can done by specifying each environment variable in the nvidia-docker run command or by editing the configuration options in the config.toml file and then specifying that file in the nvidia-docker run command.

Enable S3 with No Authentication

This example enables the S3 data connector and disables authentication. It does not pass any S3 access key or secret; however it configures Docker DNS by passing the name and IP of the S3 name node. This allows users to reference data stored in S3 directly using the name node address, for example: s3://name.node/datasets/iris.csv. Replace TAG below with the image tag.

nvidia-docker run \
            --shm-size=256m \
            --add-host name.node:172.16.2.186 \
            -e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS="file,s3" \
            -p 12345:12345 \
            --init -it --rm \
            -v /tmp/dtmp/:/tmp \
            -v /tmp/dlog/:/log \
            -v /tmp/dlicense/:/license \
            -v /tmp/ddata/:/data \
            -u $(id -u):$(id -g) \
            h2oai/dai-centos7-x86_64:TAG

Enable S3 with Authentication

This example enables the S3 data connector with authentication by passing an S3 access key ID and an access key. It also configures Docker DNS by passing the name and IP of the S3 name node. This allows users to reference data stored in S3 directly using the name node address, for example: s3://name.node/datasets/iris.csv. Replace TAG below with the image tag.

nvidia-docker run \
        --shm-size=256m \
        --add-host name.node:172.16.2.186 \
        -e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS="file,s3" \
        -e DRIVERLESS_AI_AWS_ACCESS_KEY_ID="<access_key_id>" \
        -e DRIVERLESS_AI_AWS_SECRET_ACCESS_KEY="<access_key>" \
        -p 12345:12345 \
        --init -it --rm \
        -v /tmp/dtmp/:/tmp \
        -v /tmp/dlog/:/log \
        -v /tmp/dlicense/:/license \
        -v /tmp/ddata/:/data \
        -u $(id -u):$(id -g) \
        h2oai/dai-centos7-x86_64:TAG

Start DAI by Updating the config.toml File

This example shows how to configure S3 options in the config.toml file, and then specify that file when starting Driverless AI in Docker. Note that this example enables S3 with no authentication.

  1. Configure the Driverless AI config.toml file. Set the following configuration options.
  • enabled_file_systems = "file, upload, s3"
  1. Mount the config.toml file into the Docker container.
nvidia-docker run \
  --pid=host \
  --init \
  --rm \
  --shm-size=256m \
  --add-host name.node:172.16.2.186 \
  -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml
  -p 12345:12345 \
  -v /local/path/to/config.toml:/path/in/docker/config.toml
  -v /etc/passwd:/etc/passwd:ro \
  -v /etc/group:/etc/group:ro \
  -v /tmp/dtmp/:/tmp \
  -v /tmp/dlog/:/log \
  -v /tmp/dlicense/:/license \
  -v /tmp/ddata/:/data \
  -u $(id -u):$(id -g) \
  h2oai/dai-centos7-x86_64:TAG