S3 Setup¶
Driverless AI lets you explore S3 data sources from within the Driverless AI application. This section provides instructions for configuring Driverless AI to work with S3.
Note: Depending on your Docker install version, use either the docker run --runtime=nvidia
(>= Docker 19.03) or nvidia-docker
(< Docker 19.03) command when starting the Driverless AI Docker image. Use docker version
to check which version of Docker you are using.
Description of Configuration Attributes¶
aws_access_key_id
: The S3 access key IDaws_secret_access_key
: The S3 access keyaws_role_arn
: The Amazon Resource Nameaws_default_region
: The region to use when the aws_s3_endpoint_url option is not set. This is ignored when aws_s3_endpoint_url is set.aws_s3_endpoint_url
: The endpoint URL that will be used to access S3.aws_use_ec2_role_credentials
: If set to true, the S3 Connector will try to to obtain credentials associated with the role attached to the EC2 instance.s3_init_path
: The starting S3 path that will be displayed in UI S3 browser.enabled_file_systems
: The file systems you want to enable. This must be configured in order for data connectors to function properly.
Example 1: Enable S3 with No Authentication¶
This example enables the S3 data connector and disables authentication. It does not pass any S3 access key or secret; however it configures Docker DNS by passing the name and IP of the S3 name node. This allows users to reference data stored in S3 directly using the name node address, for example: s3://name.node/datasets/iris.csv.
nvidia-docker run \
--shm-size=2g --cap-add=SYS_NICE --ulimit nofile=131071:131071 --ulimit nproc=16384:16384 \
--add-host name.node:172.16.2.186 \
-e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS="file,s3" \
-p 12345:12345 \
--init -it --rm \
-v /tmp/dtmp/:/tmp \
-v /tmp/dlog/:/log \
-v /tmp/dlicense/:/license \
-v /tmp/ddata/:/data \
-u $(id -u):$(id -g) \
h2oai/dai-ubi8-x86_64:1.11.0-cuda11.8.0.xx
This example shows how to configure S3 options in the config.toml file, and then specify that file when starting Driverless AI in Docker. Note that this example enables S3 with no authentication.
Configure the Driverless AI config.toml file. Set the following configuration options.
enabled_file_systems = "file, upload, s3"
Mount the config.toml file into the Docker container.
nvidia-docker run \ --pid=host \ --init \ --rm \ --shm-size=2g --cap-add=SYS_NICE --ulimit nofile=131071:131071 --ulimit nproc=16384:16384 \ --add-host name.node:172.16.2.186 \ -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \ -p 12345:12345 \ -v /local/path/to/config.toml:/path/in/docker/config.toml \ -v /etc/passwd:/etc/passwd:ro \ -v /etc/group:/etc/group:ro \ -v /tmp/dtmp/:/tmp \ -v /tmp/dlog/:/log \ -v /tmp/dlicense/:/license \ -v /tmp/ddata/:/data \ -u $(id -u):$(id -g) \ h2oai/dai-ubi8-x86_64:1.11.0-cuda11.8.0.xx
This example enables the S3 data connector and disables authentication. It does not pass any S3 access key or secret.
Export the Driverless AI config.toml file or add it to ~/.bashrc. For example:
# DEB and RPM export DRIVERLESS_AI_CONFIG_FILE="/etc/dai/config.toml" # TAR SH export DRIVERLESS_AI_CONFIG_FILE="/path/to/your/unpacked/dai/directory/config.toml"
Specify the following configuration options in the config.toml file.
# File System Support # upload : standard upload feature # file : local file system/server file system # hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below # dtap : Blue Data Tap file system, remember to configure the DTap section below # s3 : Amazon S3, optionally configure secret and access key below # gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below # gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below # minio : Minio Cloud Storage, remember to configure secret and access key below # snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password) # kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args) # azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key) # jdbc: JDBC Connector, remember to configure JDBC below. (jdbc_app_configs) # hive: Hive Connector, remember to configure Hive below. (hive_app_configs) # recipe_url: load custom recipe from URL # recipe_file: load custom recipe from local file system enabled_file_systems = "file, s3"
Save the changes when you are done, then stop/restart Driverless AI.
Example 2: Enable S3 with Authentication¶
This example enables the S3 data connector with authentication by passing an S3 access key ID and an access key. It also configures Docker DNS by passing the name and IP of the S3 name node. This allows users to reference data stored in S3 directly using the name node address, for example: s3://name.node/datasets/iris.csv.
nvidia-docker run \
--shm-size=2g --cap-add=SYS_NICE --ulimit nofile=131071:131071 --ulimit nproc=16384:16384 \
--add-host name.node:172.16.2.186 \
-e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS="file,s3" \
-e DRIVERLESS_AI_AWS_ACCESS_KEY_ID="<access_key_id>" \
-e DRIVERLESS_AI_AWS_SECRET_ACCESS_KEY="<access_key>" \
-p 12345:12345 \
--init -it --rm \
-v /tmp/dtmp/:/tmp \
-v /tmp/dlog/:/log \
-v /tmp/dlicense/:/license \
-v /tmp/ddata/:/data \
-u $(id -u):$(id -g) \
h2oai/dai-ubi8-x86_64:1.11.0-cuda11.8.0.xx
This example shows how to configure S3 options with authentication in the config.toml file, and then specify that file when starting Driverless AI in Docker.
Configure the Driverless AI config.toml file. Set the following configuration options.
enabled_file_systems = "file, upload, s3"
aws_access_key_id = "<access_key_id>"
aws_secret_access_key = "<access_key>"
Mount the config.toml file into the Docker container.
nvidia-docker run \ --pid=host \ --init \ --rm \ --shm-size=2g --cap-add=SYS_NICE --ulimit nofile=131071:131071 --ulimit nproc=16384:16384 \ --add-host name.node:172.16.2.186 \ -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \ -p 12345:12345 \ -v /local/path/to/config.toml:/path/in/docker/config.toml \ -v /etc/passwd:/etc/passwd:ro \ -v /etc/group:/etc/group:ro \ -v /tmp/dtmp/:/tmp \ -v /tmp/dlog/:/log \ -v /tmp/dlicense/:/license \ -v /tmp/ddata/:/data \ -u $(id -u):$(id -g) \ h2oai/dai-ubi8-x86_64:1.11.0-cuda11.8.0.xx
This example enables the S3 data connector with authentication by passing an S3 access key ID and an access key.
Export the Driverless AI config.toml file or add it to ~/.bashrc. For example:
# DEB and RPM export DRIVERLESS_AI_CONFIG_FILE="/etc/dai/config.toml" # TAR SH export DRIVERLESS_AI_CONFIG_FILE="/path/to/your/unpacked/dai/directory/config.toml"
Specify the following configuration options in the config.toml file.
# File System Support # upload : standard upload feature # file : local file system/server file system # hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below # dtap : Blue Data Tap file system, remember to configure the DTap section below # s3 : Amazon S3, optionally configure secret and access key below # gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below # gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below # minio : Minio Cloud Storage, remember to configure secret and access key below # snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password) # kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args) # azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key) # jdbc: JDBC Connector, remember to configure JDBC below. (jdbc_app_configs) # hive: Hive Connector, remember to configure Hive below. (hive_app_configs) # recipe_url: load custom recipe from URL # recipe_file: load custom recipe from local file system enabled_file_systems = "file, s3" # S3 Connector credentials aws_access_key_id = "<access_key_id>" aws_secret_access_key = "<access_key>"
Save the changes when you are done, then stop/restart Driverless AI.