JDBC 설정¶

Driverless AI를 통해 Driverless AI 애플리케이션 내에서 Java Database Connectivity(JDBC) 데이터 소스를 탐색할 수 있습니다. 이 섹션에서는 JDBC와 함께 작동할 수 있도록 Driverless AI를 구성하기 위한 지침을 제공합니다.

Note: Docker 설치 버전에 따라, Driverless AI Docker 이미지를 시작할 때는 docker run --runtime=nvidia (Docker 19.03 이후) 또는 nvidia-docker (Docker 19.03 이전) 명령을 사용하십시오. 사용 중인 Docker 버전을 확인하려면 docker version 을 사용하십시오.

테스트된 데이터베이스¶

다음 데이터베이스는 최소 기능에 대해 테스트되었습니다. 이 목록에 포함되지 않은 JDBC 드라이버는 Driverless AI와 함께 작동해야 합니다. 테스트된 데이터베이스 목록에 없는 경우에도 JDBC 드라이버의 테스트를 권장합니다. 테스트되지 않은 JDBC 드라이버의 사용 방법에 대한 정보는 이 장의 마지막인 테스트되지 않은 JDBC 드라이버 추가 섹션을 참조하십시오.

Oracle DB
PostgreSQL
Amazon Redshift
Teradata

구성 속성에 관한 설명¶

jdbc_app_configs: JDBC 커넥터에 대한 구성. 이것은 여러 개의 키를 포함한 JSON/Dictionary 문자열입니다. Note: url, jarpath 및 classpath 필드를 포함하는 중첩된 JSON에 연결하려면 JSON 키(보통 구성 중인 데이터베이스의 이름)가 필요합니다. 또한 이것은 다음 형식을 취해야 합니다.

"""{"my_jdbc_database": {"url": "jdbc:my_jdbc_database://hostname:port/database",
 "jarpath": "/path/to/my/jdbc/database.jar", "classpath": "com.my.jdbc.Driver"}}"""

예:

"""{
   "postgres": {
   "url": "jdbc:postgresql://ip address:port/postgres",
   "jarpath": "/path/to/postgres_driver.jar",
   "classpath": "org.postgresql.Driver"
   },
   "mysql": {
   "url":"mysql connection string",
   "jarpath": "/path/to/mysql_driver.jar",
   "classpath": "my.sql.classpath.Driver"
   }
}"""

Note: jdbc_app_configs 의 예상 입력값은 JSON string. 입니다. 큰따옴표("...")는 JSON dictionary within 에 키와 값을 나타내는 데 사용해야 하며, outer 따옴표는 "\”", ''', 또는 ' 형식으로 지정해야 합니다. 구성 값의 적용 방식에 따라 다른 형식의 외부 인용이 필요할 수 있습니다. 다음 예제는 외부 인용문을 적용하는 두 가지 독특한 방법을 보여줍니다.

config.toml 파일에 적용된 구성값:

jdbc_app_configs = """{"my_json_string": "value", "json_key_2": "value2"}"""

** 환경 변수** 로 적용된 구성 값:

DRIVERLESS_AI_JDBC_APP_CONFIGS='{"my_json_string": "value", "json_key_2": "value2"}'

예:

DRIVERLESS_AI_JDBC_APP_CONFIGS='{
"postgres": {"url": "jdbc:postgresql://192.xxx.x.xxx:aaaa:/name_of_database;user=name_of_user;password=your_password","jarpath": "/config/postgresql-xx.x.x.jar","classpath": "org.postgresql.Driver"},
"postgres-local": {"url": "jdbc:postgresql://123.xxx.xxx.xxx:aaaa/name_of_database","jarpath": "/config/postgresql-xx.x.x.jar","classpath": "org.postgresql.Driver"},
"ms-sql": {"url": "jdbc:sqlserver://192.xxx.x.xxx:aaaa;databaseName=name_of_database;user=name_of_user;password=your_password","Username":"your_username","passsword":"your_password","jarpath": "/config/sqljdbc42.jar","classpath": "com.microsoft.sqlserver.jdbc.SQLServerDriver"},
"oracle": {"url": "jdbc:oracle:thin:@192.xxx.x.xxx:aaaa/orclpdb1","jarpath": "ojdbc7.jar","classpath": "oracle.jdbc.OracleDriver"},
"db2": {"url": "jdbc:db2://127.x.x.x:aaaaa/name_of_database","jarpath": "db2jcc4.jar","classpath": "com.ibm.db2.jcc.DB2Driver"},
"mysql": {"url": "jdbc:mysql://192.xxx.x.xxx:aaaa;","jarpath": "mysql-connector.jar","classpath": "com.mysql.jdbc.Driver"},
"Snowflake": {"url": "jdbc:snowflake://<account_name>.snowflakecomputing.com/?<connection_params>","jarpath": "/config/snowflake-jdbc-x.x.x.jar","classpath": "net.snowflake.client.jdbc.SnowflakeDriver"},
"Derby": {"url": "jdbc:derby://127.x.x.x:aaaa/name_of_database","jarpath": "/config/derbyclient.jar","classpath": "org.apache.derby.jdbc.ClientDriver"}
}'\

jdbc_app_jvm_args: JDBC 커넥터에 대한 추가 jvm args. 예: 《-Xmx4g》.
jdbc_app_classpath: 선택적으로 JDBC 커넥터에 대한 대체 클래스패스를 지정하십시오.
enabled_file_systems: 활성화할 파일 시스템. 데이터 커넥터를 제대로 작동시키려면 이 시스템을 구성해야 합니다.

JDBC 드라이버 검색¶

JDBC 드라이버 JAR 파일 다운로드:

Oracle DB

PostgreSQL

Amazon Redshift

Teradata

Note: 구성 단계(예: org.postgresql.Driver)에 필요하므로 드라이버 클래스패스를 기록하십시오.

Docker 컨테이너에 마운트할 수 있는 위치에 드라이버 JAR를 복사하십시오.

Note: JDBC jar 파일을 저장하는 폴더는 dai 프로세스 사용자가 볼/읽을 수 있어야 합니다.

JDBC 커넥터 활성화¶

본 예제에서는 PostgresQL용 JDBC 커넥터를 활성화합니다. JDBC 연결 문자열은 사용되는 데이터베이스에 따라 다릅니다.

 nvidia-docker run \
   --pid=host \
   --init \
   --rm \
   --shm-size=256m \
   --add-host name.node:172.16.2.186 \
   -e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS="file,hdfs,jdbc" \
   -e DRIVERLESS_AI_JDBC_APP_CONFIGS='{"postgres":
                                       {"url": "jdbc:postgres://localhost:5432/my_database",
                                       "jarpath": "/path/to/postgresql/jdbc/driver.jar",
                                       "classpath": "org.postgresql.Driver"}}'  \
   -e DRIVERLESS_AI_JDBC_APP_JVM_ARGS="-Xmx2g" \
   -p 12345:12345 \
   -v /path/to/local/postgresql/jdbc/driver.jar:/path/to/postgresql/jdbc/driver.jar \
   -v /etc/passwd:/etc/passwd:ro \
   -v /etc/group:/etc/group:ro \
   -v /tmp/dtmp/:/tmp \
   -v /tmp/dlog/:/log \
   -v /tmp/dlicense/:/license \
   -v /tmp/ddata/:/data \
   -u $(id -u):$(id -g) \
   h2oai/dai-centos7-x86_64:1.10.1-cuda11.2.2.xx

이 예제에서는 config.toml 파일에서 JDBC 옵션을 구성하는 방법과 Docker에서 Driverless AI의 시작 시 해당 파일을 지정하는 방법을 보여줍니다.

Driverless AI config.toml file 파일을 구성하십시오. 다음 구성 옵션을 설정하십시오:

enabled_file_systems = "file, upload, jdbc"
jdbc_app_configs = """{"postgres": {"url": "jdbc:postgres://localhost:5432/my_database",
                     "jarpath": "/path/to/postgresql/jdbc/driver.jar",
                     "classpath": "org.postgresql.Driver"}}"""

config.toml 파일 및 필수 JAR 파일을 Docker 컨테이너에 마운트하십시오.

nvidia-docker run \
  --pid=host \
  --init \
  --rm \
  --shm-size=256m \
  --add-host name.node:172.16.2.186 \
  -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \
  -p 12345:12345 \
  -v /local/path/to/jdbc/driver.jar:/path/in/docker/jdbc/driver.jar \
  -v /local/path/to/config.toml:/path/in/docker/config.toml \
  -v /etc/passwd:/etc/passwd:ro \
  -v /etc/group:/etc/group:ro \
  -v /tmp/dtmp/:/tmp \
  -v /tmp/dlog/:/log \
  -v /tmp/dlicense/:/license \
  -v /tmp/ddata/:/data \
  -u $(id -u):$(id -g) \
  h2oai/dai-centos7-x86_64:1.10.1-cuda11.2.2.xx

본 예제는 PostgresQL용 JDBC 커넥터를 활성화합니다.

Notes:

JDBC 연결 문자열은 사용되는 데이터베이스에 따라 다릅니다.

본 구성에는 url, jarpath, 및 classpath 필드를 포함하는 중첩된 JSON에 연결하려면 JSON 키(보통 구성되는 데이터베이스의 이름)가 필요합니다. 이것은 또한 다음 형식을 취해야 합니다.
"""{"my_jdbc_database": {"url": "jdbc:my_jdbc_database://hostname:port/database",
   "jarpath": "/path/to/my/jdbc/database.jar", "classpath": "com.my.jdbc.Driver"}}"""

Driverless AI config.toml 파일을 내보내거나 ~/.bashrc에 추가합니다. 아래 예를 참조하십시오.

# DEB and RPM
export DRIVERLESS_AI_CONFIG_FILE="/etc/dai/config.toml"

# TAR SH
export DRIVERLESS_AI_CONFIG_FILE="/path/to/your/unpacked/dai/directory/config.toml"

config.toml 파일에서 아래 값을 편집하십시오.

# File System Support
# upload : standard upload feature
# file : local file system/server file system
# hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below
# dtap : Blue Data Tap file system, remember to configure the DTap section below
# s3 : Amazon S3, optionally configure secret and access key below
# gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below
# gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below
# minio : Minio Cloud Storage, remember to configure secret and access key below
# snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password)
# kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args)
# azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key)
# jdbc: JDBC Connector, remember to configure JDBC below. (jdbc_app_configs)
# hive: Hive Connector, remember to configure Hive below. (hive_app_configs)
# recipe_url: load custom recipe from URL
# recipe_file: load custom recipe from local file system
enabled_file_systems = "upload, file, hdfs, jdbc"

# Configuration for JDBC Connector.
# JSON/Dictionary String with multiple keys.
# Format as a single line without using carriage returns (the following example is formatted for readability).
# Use triple quotations to ensure that the text is read as a single string.
# Example:
# """{
# "postgres": {
# "url": "jdbc:postgresql://ip address:port/postgres",
# "jarpath": "/path/to/postgres_driver.jar",
# "classpath": "org.postgresql.Driver"
# },
# "mysql": {
# "url":"mysql connection string",
# "jarpath": "/path/to/mysql_driver.jar",
# "classpath": "my.sql.classpath.Driver"
# }
# }"""
jdbc_app_configs = """{"postgres": {"url": "jdbc:postgres://localhost:5432/my_database",
                     "jarpath": "/path/to/postgresql/jdbc/driver.jar",
                     "classpath": "org.postgresql.Driver"}}"""

# extra jvm args for jdbc connector
jdbc_app_jvm_args = ""

# alternative classpath for jdbc connector
jdbc_app_classpath = ""

완료되면 변경 사항을 저장하고 Driverless AI를 중지/재시작하십시오.

JDBC를 사용하여 데이터 세트 추가¶

JDBC 커넥터가 활성화된 후, Add Dataset (or Drag and Drop) 드롭다운 메뉴에서 JDBC 를 선택하여 데이터 세트를 추가할 수 있습니다.

데이터 세트 페이지에서 Add Dataset 버튼을 클릭하십시오.
표시되는 목록에서 JDBC 를 선택하십시오.
Select JDBC Connection 버튼을 클릭하여 JDBC 구성을 선택하십시오.
해당 양식은 JDBC 데이터베이스, URL, 드라이버 및 Jar 정보로 채워집니다. 다음 나머지 필드를 완료하십시오.

JDBC Username: JDBC 사용자 이름을 입력하십시오.

JDBC Password: JDBC 암호를 입력하십시오. (Notes 섹션 참고)

Destination Name: 새로운 데이터 세트의 이름을 입력합니다.

(옵션) ID Column Name: ID 열의 명칭을 입력하십시오. 대용량 데이터 쿼리 작성 시에는 이 필드를 지정하십시오.

Notes:

JDBC URL의 일부로 암호를 포함하지 마십시오. 대신 JDBC Password 필드에 암호를 입력하십시오. 암호는 보안 목적을 위해 별도로 입력합니다.

Driverless AI 내의 리소스 공유 때문에, JDBC 커넥터에는 상대적으로 적은 양의 메모리만 할당됩니다.

대용량 쿼리 작성 시, ID 열은 데이터를 관리 가능한 부분으로 분할하는 데 사용됩니다. 이렇게 하면 최대 메모리 할당이 초과되지 않습니다.

ID 열의 지정 없이 최대 메모리 할당보다 큰 쿼리가 작성되면 쿼리가 성공적으로 완료되지 않습니다.

쿼리하고자 하는 데이터베이스 형식으로 SQL 쿼리를 작성합니다(아래의 Query Examples 섹션을 참조하십시오). 형식은 사용되는 데이터베이스에 따라 다릅니다.
Click to Make Query 버튼을 클릭하여 쿼리를 실행합니다. 완료에 소요되는 시간은 쿼리되는 데이터의 크기와 데이터베이스의 네트워크 속도에 따라 달라집니다.

쿼리가 성공하면 데이터 세트 페이지로 돌아가고, 쿼리된 데이터를 새 데이터 세트로 사용할 수 있습니다.

쿼리 예제¶

다음은 Oracle DB 및 PostgreSQL에 대한 샘플 구성 및 쿼리입니다.

구성:

jdbc_app_configs = """{"oracledb": {"url": "jdbc:oracle:thin:@localhost:1521/oracledatabase", "jarpath": "/home/ubuntu/jdbc-jars/ojdbc8.jar", "classpath": "oracle.jdbc.OracleDriver"}}"""

샘플 쿼리:

Select JDBC Connection 드롭다운 메뉴에서 oracledb 를 선택합니다.

JDBC Username: oracleuser

JDBC Password: oracleuserpassword

ID Column Name:

Query:
SELECT MIN(ID) AS NEW_ID, EDUCATION, COUNT(EDUCATION) FROM my_oracle_schema.creditcardtrain GROUP BY EDUCATION
Note: 해당 쿼리는 ID Column Name 을 지정하지 않으므로 작은 데이터에 대해서만 작동합니다. 그러나 쿼리가 더 큰 데이터를 위한 경우에는 NEW_ID 열이 ID 열로 사용될 수 있습니다.

Click to Make Query 버튼을 클릭하여 쿼리를 실행합니다.

구성:

jdbc_app_configs = """{"postgres": {"url": "jdbc:postgresql://localhost:5432/postgresdatabase", "jarpath": "/home/ubuntu/postgres-artifacts/postgres/Driver.jar", "classpath": "org.postgresql.Driver"}}"""

샘플 쿼리:

Select JDBC Connection 드롭다운 메뉴에서 postgres 를 선택합니다.

JDBC Username: postgres_user

JDBC Password: pguserpassword

ID Column Name: id

Query:
SELECT * FROM loan_level WHERE LOAN_TYPE = 5 (selects all columns from table loan_level with column LOAN_TYPE containing value 5)

Click to Make Query 버튼을 클릭하여 쿼리를 실행합니다.

테스트되지 않은 JDBC 드라이버 추가¶

사내에서 테스트되지 않은 JDBC 드라이버를 사용해 보는 것을 권장합니다.

데이터베이스용 JDBC jar를 다운로드하십시오.
JDBC jar 파일을 DAI가 액세스할 수 있는 위치로 이동시키십시오.

JDBC 지정 환경 변수를 이용해 Driverless AI Docker 이미지를 시작하십시오.

 nvidia-docker run \
   --pid=host \
   --init \
   --rm \
   --shm-size=256m \
   --add-host name.node:172.16.2.186 \
   -e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS="upload,file,hdfs,s3,recipe_file,jdbc" \
   -e DRIVERLESS_AI_JDBC_APP_CONFIGS="""{"my_jdbc_database": {"url": "jdbc:my_jdbc_database://hostname:port/database",
                                         "jarpath": "/path/to/my/jdbc/database.jar",
                                         "classpath": "com.my.jdbc.Driver"}}"""\
   -e DRIVERLESS_AI_JDBC_APP_JVM_ARGS="-Xmx2g" \
   -p 12345:12345 \
   -v /path/to/local/postgresql/jdbc/driver.jar:/path/to/postgresql/jdbc/driver.jar \
   -v /etc/passwd:/etc/passwd:ro \
   -v /etc/group:/etc/group:ro \
   -v /tmp/dtmp/:/tmp \
   -v /tmp/dlog/:/log \
   -v /tmp/dlicense/:/license \
   -v /tmp/ddata/:/data \
   -u $(id -u):$(id -g) \
   h2oai/dai-centos7-x86_64:1.10.1-cuda11.2.2.xx

데이터베이스용 JDBC jar를 다운로드하십시오.
JDBC jar 파일을 DAI가 액세스할 수 있는 위치로 이동시키십시오.
Driverless AI config.toml file 파일을 구성하십시오. 다음 구성 옵션을 설정하십시오:

enabled_file_systems = "upload, file, hdfs, s3, recipe_file, jdbc"
jdbc_app_configs = """{"my_jdbc_database": {"url": "jdbc:my_jdbc_database://hostname:port/database",
                       "jarpath": "/path/to/my/jdbc/database.jar",
                       "classpath": "com.my.jdbc.Driver"}}"""
#Optional arguments
jdbc_app_jvm_args = ""
jdbc_app_classpath = ""

config.toml 파일 및 필수 JAR 파일을 Docker 컨테이너에 마운트하십시오.

nvidia-docker run \
  --pid=host \
  --init \
  --rm \
  --shm-size=256m \
  --add-host name.node:172.16.2.186 \
  -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \
  -p 12345:12345 \
  -v /local/path/to/jdbc/driver.jar:/path/in/docker/jdbc/driver.jar \
  -v /local/path/to/config.toml:/path/in/docker/config.toml \
  -v /etc/passwd:/etc/passwd:ro \
  -v /etc/group:/etc/group:ro \
  -v /tmp/dtmp/:/tmp \
  -v /tmp/dlog/:/log \
  -v /tmp/dlicense/:/license \
  -v /tmp/ddata/:/data \
  -u $(id -u):$(id -g) \
  h2oai/dai-centos7-x86_64:1.10.1-cuda11.2.2.xx

데이터베이스용 JDBC jar를 다운로드하십시오.
JDBC jar 파일을 DAI가 액세스할 수 있는 위치로 이동시키십시오.
다음 config.toml 설정을 수정하십시오. Docker에서 Driverless AI를 시작할 때 이것을 환경 변수로 지정할 수도 있습니다.

# enable the JDBC file system
enabled_file_systems = "upload, file, hdfs, s3, recipe_file, jdbc"

# Configure the JDBC Connector.
# JSON/Dictionary String with multiple keys.
# Format as a single line without using carriage returns (the following example is formatted for readability).
# Use triple quotations to ensure that the text is read as a single string.
# Example:
jdbc_app_configs = """{"my_jdbc_database": {"url": "jdbc:my_jdbc_database://hostname:port/database",
                       "jarpath": "/path/to/my/jdbc/database.jar",
                       "classpath": "com.my.jdbc.Driver"}}"""

# optional extra jvm args for jdbc connector
jdbc_app_jvm_args = ""

# optional alternative classpath for jdbc connector
jdbc_app_classpath = ""

완료되면 변경 사항을 저장하고 Driverless AI를 중지/재시작하십시오.