Snowflake 설정¶

Driverless AI를 사용하면 Driverless AI 애플리케이션 내에서 Snowflake 데이터 소스를 탐색할 수 있습니다. 이 섹션에서는 Driverless AI가 Snowflake와 함께 작동하도록 구성하는 내용을 설명합니다. 이 설정을 사용하려면 인증을 활성화해야 합니다. Snowflake 커넥터를 활성화하면 UI에서 해당 파일 시스템은 사용할 수 있지만, 해당 커넥터는 인증이 없으면 사용할 수 없습니다.

Note: Docker 설치 버전에 따라, Driverless AI Docker 이미지를 시작할 때는 docker run --runtime=nvidia (Docker 19.03 이후) 또는 nvidia-docker (Docker 19.03 이전) 명령을 사용하십시오. 사용 중인 Docker 버전을 확인하려면 docker version 을 사용하십시오.

구성 속성에 관한 설명¶

snowflake_account: Snowflake 계정 ID
snowflake_user: Snowflake 계정에 액세스하기 위한 사용자 이름
snowflake_password: Snowflake 계정에 액세스하기 위한 암호
enabled_file_systems: 활성화할 파일 시스템. 데이터 커넥터를 제대로 작동시키려면 이 시스템을 구성해야 합니다.

인증을 통한 Snowflake 활성화¶

이 예에서는 계정, 사용자, 암호 변수를 전달하여 인증을 통해 Snowflake 데이터 커넥터를 활성화합니다.

 nvidia-docker run \
 --rm \
 --shm-size=256m \
 -e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS="file,snow" \
 -e DRIVERLESS_AI_SNOWFLAKE_ACCOUNT = "<account_id>" \
 -e DRIVERLESS_AI_SNOWFLAKE_USER = "<username>" \
 -e DRIVERLESS_AI_SNOWFLAKE_PASSWORD = "<password>"\
 -u `id -u`:`id -g` \
 -p 12345:12345 \
 -v `pwd`/data:/data \
 -v `pwd`/log:/log \
 -v `pwd`/license:/license \
 -v `pwd`/tmp:/tmp \
 -v `pwd`/service_account_json.json:/service_account_json.json \
 h2oai/dai-centos7-x86_64:1.10.1-cuda11.2.2.xx

이 예는 config.toml 파일에서 Snowflake 옵션을 구성한 다음 Docker에서 Driverless AI를 시작할 때 해당 파일을 지정하는 방법을 보여줍니다.

Driverless AI config.toml file 파일을 구성하십시오. 다음 구성 옵션을 설정하십시오.

enabled_file_systems = "file, snow"

snowflake_account = "<account_id>"

snowflake_user = "<username>"

snowflake_password = "<password>"

config.toml 파일을 Docker 컨테이너에 마운트하십시오.

nvidia-docker run \
  --pid=host \
  --init \
  --rm \
  --shm-size=256m \
  --add-host name.node:172.16.2.186 \
  -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \
  -p 12345:12345 \
  -v /local/path/to/config.toml:/path/in/docker/config.toml \
  -v /etc/passwd:/etc/passwd:ro \
  -v /etc/group:/etc/group:ro \
  -v /tmp/dtmp/:/tmp \
  -v /tmp/dlog/:/log \
  -v /tmp/dlicense/:/license \
  -v /tmp/ddata/:/data \
  -u $(id -u):$(id -g) \
  h2oai/dai-centos7-x86_64:1.10.1-cuda11.2.2.xx

이 예에서는 계정, 사용자, 암호 변수를 전달하여 인증을 통해 Snowflake 데이터 커넥터를 활성화합니다.

Driverless AI config.toml 파일을 내보내거나 ~/.bashrc에 추가합니다. 아래 예를 참조하십시오.

# DEB and RPM
export DRIVERLESS_AI_CONFIG_FILE="/etc/dai/config.toml"

# TAR SH
export DRIVERLESS_AI_CONFIG_FILE="/path/to/your/unpacked/dai/directory/config.toml"

config.toml 파일에서 아래 구성 옵션을 지정하십시오.

# File System Support
# upload : standard upload feature
# file : local file system/server file system
# hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below
# dtap : Blue Data Tap file system, remember to configure the DTap section below
# s3 : Amazon S3, optionally configure secret and access key below
# gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below
# gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below
# minio : Minio Cloud Storage, remember to configure secret and access key below
# snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password)
# kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args)
# azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key)
# jdbc: JDBC Connector, remember to configure JDBC below. (jdbc_app_configs)
# hive: Hive Connector, remember to configure Hive below. (hive_app_configs)
# recipe_url: load custom recipe from URL
# recipe_file: load custom recipe from local file system
enabled_file_systems = "file, snow"

# Snowflake Connector credentials
snowflake_account = "<account_id>"
snowflake_user = "<username>"
snowflake_password = "<password>"

완료되면 변경 사항을 저장하고 Driverless AI를 중지/재시작하십시오.

Snowflake를 사용한 데이터 세트 추가¶

Snowflake 커넥터가 활성화된 후 Add Dataset (or Drag and Drop) 드롭다운 메뉴에서 Snowflake 를 선택하여 데이터 세트를 추가할 수 있습니다.

데이터 세트를 추가하려면 다음 정보를 지정하십시오.

Enter Database: 쿼리 중인 Snowflake 데이터베이스의 이름을 지정합니다.
Enter Warehouse: 쿼리 중인 Snowflake 웨어하우스의 이름을 지정합니다.
Enter Schema: 쿼리 중인 데이터 세트의 스키마를 지정합니다.
Enter Name for Dataset to Be Saved As: 저장할 데이터 세트의 이름을 지정합니다. CSV 파일만 가능합니다(예: myfile.csv).
Enter Username: (옵션) 이 Snowflake 계정과 연결된 사용자 이름을 지정합니다. Driverless AI를 시작할 때 config.toml에 snowflake_user 를 지정한 경우 비워둘 수 있습니다. 그렇지 않으면 필수 필드입니다.
Enter Password: (옵션) 이 Snowflake 계정과 연결된 암호를 지정합니다. Driverless AI를 시작할 때 config.toml에 snowflake_password 를 지정한 경우 비워둘 수 있습니다. 그렇지 않으면 필수 필드입니다.
Enter Role: (옵션) Snowflake에 지정된 대로 자신의 역할을 지정합니다. 자세한 내용은 https://docs.snowflake.net/manuals/user-guide/security-access-control-overview.html 를 참조하십시오.
Enter Region: (옵션) 쿼리 중인 웨어하우스의 영역을 지정합니다. 이는 데이터베이스에 액세스할 수 있는 Snowflake 제공 URL에 있습니다( <optional-deployment-name>.<region>.<cloud-provider>.snowflakecomputing.com 경우와 같음). 이는 옵션 필드이며, Driverless AI를 시작할 때 config.toml에서 snowflake_url 을 <region> 으로 지정한 경우 비워둘 수 있습니다.
Enter File Formatting Parameters: (옵션) 데이터 세트 형식을 지정하기 위한 추가 매개변수를 지정합니다. 사용 가능한 매개변수는 https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html#type-csv 에 나와 있습니다( 참고: TYPE = CSV 에 대한 매개 변수만 사용하십시오). 예를 들어, 데이터 세트에 쉼표가 포함된 텍스트 열이 있는 경우, FIELD_DELIMITER=’character’ 를 사용하여 다른 구분 기호를 지정할 수 있습니다. 복수의 매개변수는 반드시 공백으로 구분해야 합니다.

FIELD_DELIMITER=',' FIELD_OPTIONALLY_ENCLOSED_BY="" SKIP_BLANK_LINES=TRUE
Note: 지정된 구분 기호가 셀 내에서 문자로도 사용되지 않는지 확인해야 합니다. 그렇지 않으면 오류가 발생합니다. 예를 들어, 다음을 지정하여 《AMAZON_REVIEWS》 데이터 세트를 로드할 수 있습니다.

Database: UTIL_DB

Warehouse: DAI_SNOWFLAKE_TEST

Schema: AMAZON_REVIEWS_SCHEMA

Query: SELECT * FROM AMAZON_REVIEWS

Enter File Formatting Parameters (옵션): FIELD_OPTIONALLY_ENCLOSED_BY = 〈》〉

위의 예에서, FIELD_OPTIONALLY_ENCLOSED_BY 옵션을 설정하지 않은 경우, 다음 행에서 데이터 세트 가져오기가 실패합니다(데이터 세트의 구분 기호가 기본적으로 , 이기 때문입니다).
positive, 2012-05-03,Wonderful\, tasty taffy,0,0,3,5,2012,Thu,0
Note: NULL 값을 가진 Snowflake의 숫자 열은 문자 열로 변환되는 경우가 있습니다(예: \ \N). 이를 방지하려면 FILE FORMATTING PARAMETERS의 입력에 NULL_IF=() 을 추가합니다.

Enter Snowflake Query: 실행할 Snowflake 쿼리를 지정합니다.
완료하면 Click to Make Query 버튼을 선택하여 데이터 세트를 추가하십시오.