Google Cloud Storage Setup

Driverless AI allows you to explore Google Cloud Storage data sources from within the Driverless AI application. This section provides instructions for configuring Driverless AI to work with Google Cloud Storage. This setup requires you to enable authentication. If you enable GCS or GBP connectors, those file systems will be available in the UI, but you will not be able to use those connectors without authentication.

In order to enable the GCS data connector with authentication, you must:

  1. Obtain a JSON authentication file from GCP.

  2. Mount the JSON file to the Docker instance.

  3. Specify the path to the /json_auth_file.json in the GCS_PATH_TO_SERVICE_ACCOUNT_JSON environmental variable.

Note: The account JSON includes authentications as provided by the system administrator. You can be provided a JSON file that contains both Google Cloud Storage and Google BigQuery authentications, just one or the other, or none at all.

Description of Configuration Attributes

  • gcs_path_to_service_account_json: Specifies the path to the /json_auth_file.json file.

  • gcs_init_path: Specifies the starting GCS path displayed in the UI of the GCS browser.

Start Driverless AI

This section describes how to enable the Google Cloud Storage data connector when starting Driverless AI in Docker. This can done by specifying each environment variable in the nvidia-docker run command or by editing the configuration options in the config.toml file and then specifying that file in the nvidia-docker run command.

Start GCS with Authentication

This example enables the GCS data connector with authentication by passing the JSON authentication file. This assumes that the JSON file contains Google Cloud Storage authentications. Replace TAG below with the image tag.

nvidia-docker run \
    --pid=host \
    --init \
    --rm \
    --shm-size=256m \
    -e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS="file,gcs" \
    -e DRIVERLESS_AI_GCS_PATH_TO_SERVICE_ACCOUNT_JSON="/service_account_json.json" \
    -u `id -u`:`id -g` \
    -p 12345:12345 \
    -v `pwd`/data:/data \
    -v `pwd`/log:/log \
    -v `pwd`/license:/license \
    -v `pwd`/tmp:/tmp \
    -v `pwd`/service_account_json.json:/service_account_json.json \
    h2oai/dai-centos7-x86_64:TAG

Start DAI Using Environment Variables

This example shows how to configure the GCS data connector options in the config.toml file, and then specify that file when starting Driverless AI in Docker.

  1. Configure the Driverless AI config.toml file. Set the following configuration options:

  • enabled_file_systems = "file, upload, gcs"

  • gcs_path_to_service_account_json = "/service_account_json.json"

  1. Mount the config.toml file into the Docker container.

nvidia-docker run \
  --pid=host \
  --init \
  --rm \
  --shm-size=256m \
  --add-host name.node:172.16.2.186 \
  -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \
  -p 12345:12345 \
  -v /local/path/to/config.toml:/path/in/docker/config.toml \
  -v /etc/passwd:/etc/passwd:ro \
  -v /etc/group:/etc/group:ro \
  -v /tmp/dtmp/:/tmp \
  -v /tmp/dlog/:/log \
  -v /tmp/dlicense/:/license \
  -v /tmp/ddata/:/data \
  -u $(id -u):$(id -g) \
  h2oai/dai-centos7-x86_64:TAG