JDBC Setup

Driverless AI allows you to explore Java Database Connectivity (JDBC) data sources from within the Driverless AI application. This section provides instructions for configuring Driverless AI to work with JDBC.

Tested Databases

The following databases have been tested for minimal functionality. Note that JDBC drivers that are not included in this list should work with Driverless AI. We recommend that you test out your JDBC driver even if you do not see it on list of tested databases. See the Adding an Untested JDBC Driver section at the end of this chapter for information on how to try out an untested JDBC driver.

  • Oracle DB

  • PostgreSQL

  • Amazon Redshift

  • Teradata

Description of Configuration Attributes

  • enabled_file_systems: The file systems you want to enable. This must be configured in order for data connectors to function properly.

  • jdbc_app_configs: Configuration for the JDBC connector. This is a JSON/Dictionary String with multiple keys. This requires a JSON key (typically the name of the database being configured) to be associated with a nested JSON that contains the url, jarpath, and classpath fields. In addition, this should take the format:

"""{"my_jdbc_database": {"url": "jdbc:my_jdbc_database://hostname:port/database",
 "jarpath": "/path/to/my/jdbc/database.jar", "classpath": "com.my.jdbc.Driver"}}"""

For example:

"""{
 "postgres": {
  "url": "jdbc:postgresql://ip address:port/postgres",
  "jarpath": "/path/to/postgres_driver.jar",
  "classpath": "org.postgresql.Driver"
 },
 "mysql": {
  "url":"mysql connection string",
  "jarpath": "/path/to/mysql_driver.jar",
  "classpath": "my.sql.classpath.Driver"
  }
}"""

Note: The expected input of jdbc_app_configs is a JSON string. Double quotation marks ("...") must be used to denote keys and values within the JSON dictionary, and outer quotations must be formatted as either """, ''', or '. Depending on how the configuration value is applied, different forms of outer quotations may be required. The following examples show two unique methods for applying outer quotations.

  • Configuration value applied with the config.toml file:

    jdbc_app_configs = """{"my_json_string": "value", "json_key_2": "value2"}"""
    
  • Configuration value applied with an environment variable:

    DRIVERLESS_AI_JDBC_APP_CONFIGS='{"my_json_string": "value", "json_key_2": "value2"}'
    
  • jdbc_app_jvm_args: Extra jvm args for JDBC connector. For example, “-Xmx4g”.

  • jdbc_app_classpath: Optionally specify an alternative classpath for the JDBC connector.

Retrieve the JDBC Driver

  1. Download JDBC Driver JAR files:

Note: Remember to take note of the driver classpath, as it is needed for the configuration steps (for example, org.postgresql.Driver).

  1. Copy the driver JAR to a location that can be mounted into the Docker container.

Note: The folder storing the JDBC jar file must be visible/readable by the dai process user.

Start Driverless AI

This section describes how to enable JDBC when starting Driverless AI in Docker. This can done by specifying each environment variable in the nvidia-docker run command or by editing the configuration options in the config.toml file and then specifying that file in the nvidia-docker run command.

Start DAI Using Environment Variables

This example enables the JDBC connector for PostgresQL. Note that the JDBC connection strings will vary depending on the database that is used. Replace TAG below with the image tag.

nvidia-docker run \
  --pid=host \
  --init \
  --rm \
  --shm-size=256m \
  --add-host name.node:172.16.2.186 \
  -e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS="file,hdfs,jdbc" \
  -e DRIVERLESS_AI_JDBC_APP_CONFIGS='{"postgres":
                                      {"url": "jdbc:postgres://localhost:5432/my_database",
                                      "jarpath": "/path/to/postgresql/jdbc/driver.jar",
                                      "classpath": "org.postgresql.Driver"}}'  \
  -e DRIVERLESS_AI_JDBC_APP_JVM_ARGS="-Xmx2g" \
  -p 12345:12345 \
  -v /path/to/local/postgresql/jdbc/driver.jar:/path/to/postgresql/jdbc/driver.jar \
  -v /etc/passwd:/etc/passwd:ro \
  -v /etc/group:/etc/group:ro \
  -v /tmp/dtmp/:/tmp \
  -v /tmp/dlog/:/log \
  -v /tmp/dlicense/:/license \
  -v /tmp/ddata/:/data \
  -u $(id -u):$(id -g) \
  h2oai/dai-centos7-x86_64:TAG

Start DAI by Updating the config.toml File

This example shows how to configure JDBC options in the config.toml file, and then specify that file when starting Driverless AI in Docker.

  1. Configure the Driverless AI config.toml file. Set the following configuration options:

enabled_file_systems = "file, upload, jdbc"
jdbc_app_configs = """{"postgres": {"url": "jdbc:postgres://localhost:5432/my_database",
                     "jarpath": "/path/to/postgresql/jdbc/driver.jar",
                     "classpath": "org.postgresql.Driver"}}"""
  1. Mount the config.toml file and requisite JAR files into the Docker container.

nvidia-docker run \
  --pid=host \
  --init \
  --rm \
  --shm-size=256m \
  --add-host name.node:172.16.2.186 \
  -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \
  -p 12345:12345 \
  -v /local/path/to/jdbc/driver.jar:/path/in/docker/jdbc/driver.jar \
  -v /local/path/to/config.toml:/path/in/docker/config.toml \
  -v /etc/passwd:/etc/passwd:ro \
  -v /etc/group:/etc/group:ro \
  -v /tmp/dtmp/:/tmp \
  -v /tmp/dlog/:/log \
  -v /tmp/dlicense/:/license \
  -v /tmp/ddata/:/data \
  -u $(id -u):$(id -g) \
  h2oai/dai-centos7-x86_64:TAG

Adding Datasets Using JDBC

After the JDBC connector is enabled, you can add datasets by selecting JDBC from the Add Dataset (or Drag and Drop) drop-down menu.

Make JDBC Query
  1. Click on the Add Dataset button on the Datasets page.

  2. Select JDBC from the list that appears.

  3. Click on the Select JDBC Connection button to select a JDBC configuration.

  4. The form will populate with the JDBC Database, URL, Driver, and Jar information. Complete the following remaining fields:

  • JDBC Username: Enter your JDBC username.

  • JDBC Password: Enter your JDBC password.

  • Destination Name: Enter a name for the new dataset.

  • (Optional) ID Column Name: Enter a name for the ID column. Specify this field when making large data queries.

Notes:

  • Due to resource sharing within Driverless AI, the JDBC Connector is only allocated a relatively small amount of memory.

  • When making large queries, the ID column is used to partition the data into manageable portions. This ensures that the maximum memory allocation is not exceeded.

  • If a query that is larger than the maximum memory allocation is made without specifying an ID column, the query will not complete successfully.

  1. Write a SQL Query in the format of the database that you want to query. (See the Query Examples section below.) The format will vary depending on the database that is used.

  2. Click the Click to Make Query button to execute the query. The time it takes to complete depends on the size of the data being queried and the network speeds to the database.

On a successful query, you will be returned to the datasets page, and the queried data will be available as a new dataset.

Query Examples

The following are sample configurations and queries for Oracle DB and PostgreSQL:

Oracle DB

  1. Configuration:

jdbc_app_configs = '{"oracledb": {"url": "jdbc:oracle:thin:@localhost:1521/oracledatabase", "jarpath": "/home/ubuntu/jdbc-jars/ojdbc8.jar", "classpath": "oracle.jdbc.OracleDriver"}}'
  1. Sample Query:

  • Select oracledb from the Select JDBC Connection dropdown menu.

  • JDBC Username: oracleuser

  • JDBC Password: oracleuserpassword

  • ID Column Name:

  • Query:

SELECT MIN(ID) AS NEW_ID, EDUCATION, COUNT(EDUCATION) FROM my_oracle_schema.creditcardtrain GROUP BY EDUCATION

Note: Because this query does not specify an ID Column Name, it will only work for small data. However, the NEW_ID column can be used as the ID Column if the query is for larger data.

  1. Click the Click to Make Query button to execute the query.

PostgreSQL

  1. Configuration:

jdbc_app_configs = '{"postgres": {"url": "jdbc:postgresql://localhost:5432/postgresdatabase", "jarpath": "/home/ubuntu/postgres-artifacts/postgres/Driver.jar", "classpath": "org.postgresql.Driver"}}'
  1. Sample Query:

  • Select postgres from the Select JDBC Connection dropdown menu.

  • JDBC Username: postgres_user

  • JDBC Password: pguserpassword

  • ID Column Name: id

  • Query:

SELECT * FROM loan_level WHERE LOAN_TYPE = 5 (selects all columns from table loan_level with column LOAN_TYPE containing value 5)
  1. Click the Click to Make Query button to execute the query.

Adding an Untested JDBC Driver

We encourage you to try out JDBC drivers that are not tested in house.

  1. Download the JDBC jar for your database.

  2. Move your JDBC jar file to a location that DAI can access.

  3. Modify the following config.toml settings. Note that these can also be specified as environment variables when starting Driverless AI in Docker:

# enable the JDBC file system
enabled_file_systems = "upload, file, hdfs, s3, recipe_file, jdbc"

# Configure the JDBC Connector.
# JSON/Dictionary String with multiple keys.
# Format as a single line without using carriage returns (the following example is formatted for readability).
# Use triple quotations to ensure that the text is read as a single string.
# Example:
jdbc_app_configs = """{"my_jdbc_database": {"url": "jdbc:my_jdbc_database://hostname:port/database",
                       "jarpath": "/path/to/my/jdbc/database.jar",
                       "classpath": "com.my.jdbc.Driver"}}"""

# optional extra jvm args for jdbc connector
jdbc_app_jvm_args = ""

# optional alternative classpath for jdbc connector
jdbc_app_classpath = ""
  1. Save the changes when you are done, then stop/restart Driverless AI.