Snowflake Setup

Driverless AI lets you explore Snowflake data sources from within the Driverless AI application. This section provides instructions for configuring Driverless AI to work with Snowflake. This setup requires you to enable authentication. If you enable Snowflake connectors, those file systems are available in the UI, but you can’t use those connectors without authentication.

备注

For Docker 19.03 and later, use the --gpus all flag with docker run to enable GPU support. The older nvidia-docker wrapper is deprecated and no longer recommended. Ensure that the NVIDIA Container Toolkit is installed. To check your Docker version, run docker version.

Description of Configuration Attributes

snowflake_account: The Snowflake account ID.
snowflake_user: The username for accessing the Snowflake account.
snowflake_password: The password for accessing the Snowflake account.
snowflake_host: The Snowflake hostname to connect to inside Snowpark Container Services.
snowflake_port: The Snowflake port to connect to inside Snowpark Container Services.
enabled_file_systems: The file systems you want to enable. You must configure this attribute for data connectors to function properly.

Enable Snowflake with Authentication

This example demonstrates how to enable the Snowflake data connector with authentication by passing the account, user, and password variables.

 docker run --gpus all \
 --rm \
 --shm-size=2g --cap-add=SYS_NICE --ulimit nofile=131071:131071 --ulimit nproc=16384:16384 \
 -e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS="file,snow" \
 -e DRIVERLESS_AI_SNOWFLAKE_ACCOUNT = "<account_id>" \
 -e DRIVERLESS_AI_SNOWFLAKE_USER = "<username>" \
 -e DRIVERLESS_AI_SNOWFLAKE_PASSWORD = "<password>"\
 -u `id -u`:`id -g` \
 -p 12345:12345 \
 -v `pwd`/data:/data \
 -v `pwd`/log:/log \
 -v `pwd`/license:/license \
 -v `pwd`/tmp:/tmp \
 -v `pwd`/service_account_json.json:/service_account_json.json \
 h2oai/dai-ubi8-x86_64:2.3.0-cuda11.8.0.xx

This example demonstrates how to configure Snowflake options in the config.toml file, and then specify that file when starting Driverless AI in Docker.

Configure the Driverless AI config.toml file. Set the following configuration options.

enabled_file_systems = "file, snow"

snowflake_account = "<account_id>"

snowflake_user = "<username>"

snowflake_password = "<password>"

Mount the config.toml file into the Docker container.

docker run --gpus all \
  --pid=host \
  --init \
  --rm \
  --shm-size=2g --cap-add=SYS_NICE --ulimit nofile=131071:131071 --ulimit nproc=16384:16384 \
  --add-host name.node:172.16.2.186 \
  -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \
  -p 12345:12345 \
  -v /local/path/to/config.toml:/path/in/docker/config.toml \
  -v /etc/passwd:/etc/passwd:ro \
  -v /etc/group:/etc/group:ro \
  -v /tmp/dtmp/:/tmp \
  -v /tmp/dlog/:/log \
  -v /tmp/dlicense/:/license \
  -v /tmp/ddata/:/data \
  -u $(id -u):$(id -g) \
  h2oai/dai-ubi8-x86_64:2.3.0-cuda11.8.0.xx

This example demonstrates how to enable the Snowflake data connector with authentication by passing the account, user, and password variables.

Export the Driverless AI config.toml file or add it to ~/.bashrc. For example:

# DEB and RPM
export DRIVERLESS_AI_CONFIG_FILE="/etc/dai/config.toml"

# TAR SH
export DRIVERLESS_AI_CONFIG_FILE="/path/to/your/unpacked/dai/directory/config.toml"

Specify the following configuration options in the config.toml file.

# File System Support
# upload : standard upload feature
# file : local file system/server file system
# hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below
# dtap : Blue Data Tap file system, remember to configure the DTap section below
# s3 : Amazon S3, optionally configure secret and access key below
# gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below
# gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below
# minio : Minio Cloud Storage, remember to configure secret and access key below
# snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password)
# kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args)
# azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key)
# jdbc: JDBC Connector, remember to configure JDBC below. (jdbc_app_configs)
# hive: Hive Connector, remember to configure Hive below. (hive_app_configs)
# recipe_url: load custom recipe from URL
# recipe_file: load custom recipe from local file system
enabled_file_systems = "file, snow"

# Snowflake Connector credentials
snowflake_account = "<account_id>"
snowflake_user = "<username>"
snowflake_password = "<password>"

Save your changes, then stop and restart Driverless AI.

Snowflake Single Sign-On (SSO)

The Snowflake connector features support for single sign-on (SSO). This means that with the proper setup, you can log into Driverless AI (DAI), connect to Snowflake, and add datasets from Snowflake without having to enter any credentials in the UI or config.toml. To ensure that SSO works correctly, DAI and Snowflake must be configured with the same OpenID Connect (OIDC) authentication provider. For information on setting up OIDC authentication, see Setting up OIDC authentication. For Snowflake, refer to the External OAuth Overview in the official Snowflake documentation.

With H2O Secure Store

You can configure an OAuth Client in H2O Secure Store and use it for OIDC authentication with Snowflake. Use the following Driverless AI configuration options to specify the Snowflake account URL, the OAuth client ID, and the H2O Secure Store endpoint URL.

snowflake_url="https://<SNOWFLAKE_ACCOUNT>.snowflakecomputing.com"
snowflake_h2o_secure_store_oauth_client_id="<OAUTH_CLIENT_ID>"
h2o_secure_store_endpoint_url="<H2O_SECURE_STORE_URL>"

Setup Guide for Azure Entra ID

This guide explains how to configure Driverless AI to use Snowflake through H2O Secure Store SSO, with Azure Entra ID (formerly Azure Active Directory) as the identity provider.

Prerequisites

Before you begin, ensure that you have the following:

An Azure Entra ID tenant
A Snowflake account with ACCOUNTADMIN privileges
Access to H2O Secure Store and Driverless AI configuration
Administrative access to Azure Entra ID

备注

Throughout this guide, replace placeholders (for example, <TENANT_ID>, <OAUTH_CLIENT_ID>) with your actual values.

Step 1: Configure Azure Portal

In the Azure Portal, navigate to Entra ID > App Registrations > New registration.
Register a new application with the following settings:
- Name: Snowflake OAuth Client
- Supported account types: Accounts in this organizational directory only (single tenant)
- Redirect URI: Leave this empty for now; you will set it later after creating the H2O Secure Store OAuth Client.
Click Register.
Record the following client information from the Overview page:
- Application (client) ID: This is your OAUTH_CLIENT_ID
- Directory (tenant) ID: This is your TENANT_ID
Configure a client secret:
1. Navigate to Certificates & Secrets.
2. Click New client secret.
3. Enter the following information:
  - Description: Snowflake SSO Integration
  - Expires: 24 months (or shorter for production security)
4. Click Add.
5. Immediately copy the Value field. This is your <OAUTH_CLIENT_SECRET>. It will not be shown again.
Configure authentication settings:
1. Navigate to Authentication.
2. Click Add a platform > Web.
3. You will set the redirect URI after you create the H2O Secure Store OAuth Client. The format is: https://<your_secure_store_endpoint>/oauth2/<your_h2o_oauth_client_id>/callback. You can leave it empty for now or enter a placeholder.
4. Under Implicit grant and hybrid flows, select the following checkboxes:
  - Access tokens
  - ID tokens
5. Click Save.
Expose an API and add scopes:
1. Navigate to Expose an API.
2. Click Set next to Application ID URI.
3. Use the default value: api://OAUTH_CLIENT_ID
4. Click Save.
5. Click Add a scope and enter the following information:
  - Scope name: session:role-any (or the role you want to use)
  - Who can consent: Admins and users
  - Admin consent display name: Snowflake Role Access
  - Admin consent description: Allows access to Snowflake with any assigned role
  - User consent display name: Snowflake Access (or the name you want to use)
  - User consent description: Access your Snowflake data with appropriate roles
  - State: Enabled
6. Click Add scope.
Add API permissions:
1. Navigate to API Permissions.
2. Click Add a permission > My APIs, and then select your app.
3. Under Delegated permissions, select session:role-any (or the role you want to use).
4. Click Add permissions.
5. Click Grant admin consent for <tenant> and confirm. The status should show Granted.
Add optional claims (required for user mapping in Snowflake):
1. Navigate to Token configuration.
2. Click Add optional claim.
3. Select the following options:
  - Token type: Access
  - Claim: Select preferred_username (or the claim you want to use; this must match your Snowflake mapping claim)
4. Click Add and save.

备注

The optional claim ensures that the claim is included in access tokens for user mapping.

Step 2: Configure Snowflake

Connect to Snowflake using the web interface with the ACCOUNTADMIN role.

Create the external OAuth security integration by running the following SQL commands:

USE ROLE accountadmin;
CREATE SECURITY INTEGRATION external_oauth_azure_integration
TYPE = EXTERNAL_OAUTH
ENABLED = TRUE
EXTERNAL_OAUTH_TYPE = AZURE
EXTERNAL_OAUTH_ISSUER = 'https://sts.windows.net/<TENANT_ID>/'
-- Use the claim you want to use below; this must match your Entra ID optional claim
EXTERNAL_OAUTH_TOKEN_USER_MAPPING_CLAIM = 'preferred_username'
EXTERNAL_OAUTH_SNOWFLAKE_USER_MAPPING_ATTRIBUTE = 'login_name'
EXTERNAL_OAUTH_JWS_KEYS_URL = 'https://login.microsoftonline.com/<TENANT_ID>/discovery/v2.0/keys'
EXTERNAL_OAUTH_AUDIENCE_LIST = ('api://<OAUTH_CLIENT_ID>')
EXTERNAL_OAUTH_ANY_ROLE_MODE = ENABLE
COMMENT = 'Entra ID integration for Driverless AI Snowflake access';

Verify the security integration:

DESC SECURITY INTEGRATION external_oauth_azure_integration;

(Optional) For testing privileged roles, run the following command:

ALTER ACCOUNT SET EXTERNAL_OAUTH_ADD_PRIVILEGED_ROLES_TO_BLOCKED_LIST = FALSE;

警告

The EXTERNAL_OAUTH_ADD_PRIVILEGED_ROLES_TO_BLOCKED_LIST setting should only be set to FALSE in testing environments. In production, review Snowflake security best practices.

Step 3: Configure H2O Secure Store OAuth Client

In H2O Secure Store, create a new OAuth Client with the following settings:
- Display name: Snowflake OAuth (or any name you prefer)
- Issuer: https://login.microsoftonline.com/<TENANT_ID>/v2.0
- Client ID: <OAUTH_CLIENT_ID> (the Client ID from Entra ID)
- Client Secret: <OAUTH_CLIENT_SECRET> (the Client Secret from Entra ID that you noted earlier)
- Login Principal Claim: preferred_username (or the claim you want to use; this must match your Snowflake mapping claim)
- Extra Scopes: Enter the following scopes:
  - email
  - profile
  - api://<OAUTH_CLIENT_ID>/session:role-any (or the role you want to use)
Click Save.
After saving, note the OAuth Client ID that was generated by H2O Secure Store. This is the <H2O_SECURE_STORE_OAUTH_CLIENT_ID> that you will use in the Driverless AI configuration.
Update the Entra ID callback URL:
1. Return to Azure Portal > App Registrations > Your App > Authentication > Web.
2. Click Add redirectURI.
3. Enter https://<your_secure_store_endpoint>/oauth2/<H2O_SECURE_STORE_OAUTH_CLIENT_ID>/callback
4. Click Save.

Step 4: Configure Driverless AI

Update the Driverless AI configuration with the following settings:

h2o_secure_store_endpoint_url = "https://<your_secure_store_endpoint>"
snowflake_h2o_secure_store_oauth_client_id = "<H2O_SECURE_STORE_OAUTH_CLIENT_ID>"
snowflake_url = "https://<your_snowflake_account_url>"

备注

The h2o_secure_store_endpoint_url can be retrieved from the callback URL. The snowflake_h2o_secure_store_oauth_client_id is the ID of the OAuth Client you created in H2O Secure Store (for example, snowflake-oauth).

Step 5: Import Snowflake Dataset into Driverless AI (SSO Authentication)

When using SSO authentication with H2O Secure Store, you do not need to enter a username and password when querying Snowflake.

Once your Driverless AI instance is up and running, click + ADD DATASET > SNOWFLAKE.
Enter your SQL query. Leave the username and password fields blank.
Click MAKE QUERY.

The first time you run the query, an error message will appear with a link that you need to follow to authenticate against Snowflake.
Click the link in the error message to authenticate.
After you successfully authenticate, return to Driverless AI.

备注

You only need to authenticate once for the lifetime of a token from your identity provider.

Repeat steps 1 through 3. This time, the authentication will succeed and your dataset will load into Driverless AI.

Adding Datasets Using Snowflake (Username/Password Authentication)

If you are using traditional username/password authentication (not SSO), you can add datasets by selecting Snowflake from the + Add Dataset (or Drag and Drop) drop-down menu and providing credentials.

Specify the following information to add your dataset.

Enter Database: Specify the name of the Snowflake database that you are querying.
Enter Warehouse: Specify the name of the Snowflake warehouse that you are querying.
Enter Schema: Specify the schema of the dataset that you are querying.
Enter Name for Dataset to Be Saved As: Specify a name for the dataset to be saved as. Note that this can only be a CSV file (for example, myfile.csv).
Enter Username: (Optional) Specify the username associated with this Snowflake account. This can be left blank if snowflake_user was specified in the config.toml when starting Driverless AI; otherwise, this field is required.
Enter Password: (Optional) Specify the password associated with this Snowflake account. This can be left blank if snowflake_password was specified in the config.toml when starting Driverless AI; otherwise, this field is required.
Enter Role: (Optional) Specify your role as designated within Snowflake. See https://docs.snowflake.net/manuals/user-guide/security-access-control-overview.html for more information.
Enter Region: (Optional) Specify the region of the warehouse that you are querying. This can be found in the Snowflake-provided URL to access your database (as in <optional-deployment-name>.<region>.<cloud-provider>.snowflakecomputing.com). This is optional and can also be left blank if snowflake_url was specified with a <region> in the config.toml when starting Driverless AI.
Enter File Formatting Parameters: (Optional) Specify any additional parameters for formatting your datasets. Available parameters are listed in https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html#type-csv. (Note: Use only parameters for TYPE = CSV.) For example, if your dataset includes a text column that contains commas, you can specify a different delimiter using FIELD_DELIMITER='character'. Multiple parameters must be separated with spaces:

FIELD_DELIMITER=',' FIELD_OPTIONALLY_ENCLOSED_BY="" SKIP_BLANK_LINES=TRUE
Note: Be sure that the specified delimiter is not also used as a character within a cell; otherwise an error will occur. For example, you might specify the following to load the “AMAZON_REVIEWS” dataset:

Database: UTIL_DB

Warehouse: DAI_SNOWFLAKE_TEST

Schema: AMAZON_REVIEWS_SCHEMA

Query: SELECT * FROM AMAZON_REVIEWS

Enter File Formatting Parameters (Optional): FIELD_OPTIONALLY_ENCLOSED_BY = ‘”’

In the above example, if the FIELD_OPTIONALLY_ENCLOSED_BY option is not set, the following row will result in a failure to import the dataset (as the dataset’s delimiter is , by default):
positive, 2012-05-03,Wonderful\, tasty taffy,0,0,3,5,2012,Thu,0
Note: Numeric columns from Snowflake that have NULL values are sometimes converted to strings (for example, \ \N). To prevent this from occuring, add NULL_IF=() to the input of FILE FORMATTING PARAMETERS.

Enter Snowflake Query: Specify the Snowflake query that you want to execute.
When you are finished, select the Click to Make Query button to add the dataset.

Troubleshooting Snowflake Connection Errors

When you encounter connection errors with Snowflake, the error messages in the Driverless AI logs often include a UUID that can be used to retrieve more detailed information about the error.

Troubleshooting Authentication Failures

For example, when you encounter an OAuth access token error in the Driverless AI logs similar to the following:

250001 (08001): Failed to connect to DB: YDYTYTC-XB02642.snowflakecomputing.com:443.
Invalid OAuth access token. [da0650da-9e1e-4783-81ec-ed744d046cf2]

The UUID at the end of the error message (in this example, da0650da-9e1e-4783-81ec-ed744d046cf2) can be used to identify the cause of the token failure by executing a query in Snowflake.

备注

Only users with a role that has the MONITOR privilege can execute the following query. For more information, see the SYSTEM$GET_LOGIN_FAILURE_DETAILS function documentation.

To investigate the error:

Connect to Snowflake with an account that has the MONITOR privilege.
Execute the following SQL query in Snowflake, replacing <UUID> with the actual value from the log message:
```
SELECT JSON_EXTRACT_PATH_TEXT(
   SYSTEM$GET_LOGIN_FAILURE_DETAILS('<UUID>'),
   'errorCode'
);
```
Use the returned errorCode to identify the root cause of the authentication failure, such as an expired token, invalid audience, missing claim, or other OAuth/SAML/key-pair issue.

For more information about Snowflake error codes and troubleshooting SAML/OAuth issues, see the Snowflake SAML Errors documentation.