JDBC Setup

Driverless AI lets you explore Java Database Connectivity (JDBC) data sources from within the Driverless AI application. This section provides instructions for configuring Driverless AI to work with JDBC.

Note: Depending on your Docker install version, use either the docker run --runtime=nvidia (>= Docker 19.03) or nvidia-docker (< Docker 19.03) command when starting the Driverless AI Docker image. Use docker version to check which version of Docker you are using.

Tested Databases

The following databases have been tested for minimal functionality. Note that JDBC drivers that are not included in this list should work with Driverless AI. We recommend that you test out your JDBC driver even if you do not see it on list of tested databases. See the Adding an Untested JDBC Driver section at the end of this chapter for information on how to try out an untested JDBC driver.

  • Oracle DB

  • PostgreSQL

  • Amazon Redshift

  • Teradata

Description of Configuration Attributes

  • jdbc_app_configs: Configuration for the JDBC connector. This is a JSON/Dictionary String with multiple keys. Note: This requires a JSON key (typically the name of the database being configured) to be associated with a nested JSON that contains the url, jarpath, and classpath fields. In addition, this should take the format:

"""{"my_jdbc_database": {"url": "jdbc:my_jdbc_database://hostname:port/database",
 "jarpath": "/path/to/my/jdbc/database.jar", "classpath": "com.my.jdbc.Driver"}}"""

For example:

"""{
   "postgres": {
   "url": "jdbc:postgresql://ip address:port/postgres",
   "jarpath": "/path/to/postgres_driver.jar",
   "classpath": "org.postgresql.Driver"
   },
   "mysql": {
   "url":"mysql connection string",
   "jarpath": "/path/to/mysql_driver.jar",
   "classpath": "my.sql.classpath.Driver"
   }
}"""

Note: The expected input of jdbc_app_configs is a JSON string. Double quotation marks ("...") must be used to denote keys and values within the JSON dictionary, and outer quotations must be formatted as either """, ''', or '. Depending on how the configuration value is applied, different forms of outer quotations may be required. The following examples show two unique methods for applying outer quotations.

  • Configuration value applied with the config.toml file:

    jdbc_app_configs = """{"my_json_string": "value", "json_key_2": "value2"}"""
    
  • Configuration value applied with an environment variable:

    DRIVERLESS_AI_JDBC_APP_CONFIGS='{"my_json_string": "value", "json_key_2": "value2"}'
    

    For example:

    DRIVERLESS_AI_JDBC_APP_CONFIGS='{
    "postgres": {"url": "jdbc:postgresql://192.xxx.x.xxx:aaaa:/name_of_database;user=name_of_user;password=your_password","jarpath": "/config/postgresql-xx.x.x.jar","classpath": "org.postgresql.Driver"},
    "postgres-local": {"url": "jdbc:postgresql://123.xxx.xxx.xxx:aaaa/name_of_database","jarpath": "/config/postgresql-xx.x.x.jar","classpath": "org.postgresql.Driver"},
    "ms-sql": {"url": "jdbc:sqlserver://192.xxx.x.xxx:aaaa;databaseName=name_of_database;user=name_of_user;password=your_password","Username":"your_username","passsword":"your_password","jarpath": "/config/sqljdbc42.jar","classpath": "com.microsoft.sqlserver.jdbc.SQLServerDriver"},
    "oracle": {"url": "jdbc:oracle:thin:@192.xxx.x.xxx:aaaa/orclpdb1","jarpath": "ojdbc7.jar","classpath": "oracle.jdbc.OracleDriver"},
    "db2": {"url": "jdbc:db2://127.x.x.x:aaaaa/name_of_database","jarpath": "db2jcc4.jar","classpath": "com.ibm.db2.jcc.DB2Driver"},
    "mysql": {"url": "jdbc:mysql://192.xxx.x.xxx:aaaa;","jarpath": "mysql-connector.jar","classpath": "com.mysql.jdbc.Driver"},
    "Snowflake": {"url": "jdbc:snowflake://<account_name>.snowflakecomputing.com/?<connection_params>","jarpath": "/config/snowflake-jdbc-x.x.x.jar","classpath": "net.snowflake.client.jdbc.SnowflakeDriver"},
    "Derby": {"url": "jdbc:derby://127.x.x.x:aaaa/name_of_database","jarpath": "/config/derbyclient.jar","classpath": "org.apache.derby.jdbc.ClientDriver"}
    }'\
    
  • jdbc_app_jvm_args: Extra jvm args for JDBC connector. For example, “-Xmx4g”.

  • jdbc_app_classpath: Optionally specify an alternative classpath for the JDBC connector.

  • enabled_file_systems: The file systems you want to enable. This must be configured in order for data connectors to function properly.

Troubleshooting

The following section contains important troubleshooting information.

Supported Java system properties

You can use the following supported Java system properties in the DAI JDBC connector for troubleshooting purposes.

dai.connectors.jdbc.system_user

Some JDBC drivers, such as those that use Kerberos for authentication, require a system user to be present. In some cases, a system user may not be present for the JDBC connector process, such as when running Driverless AI in a Docker container. In such setups, you may encounter the following error when using the JDBC connector with Kerberos authentication:

KerberosAuthException: failure to login: javax.security.auth.login.LoginException: java.lang.NullPointerException: invalid null input: name

You can use the dai.connectors.jdbc.system_user Java system property to set a user for the JDBC connector.

jdbc_app_jvm_args = "-Xmx4g -Duser.home=/tmp -Ddai.connectors.jdbc.system_user=dai"
dai.connectors.jdbc.save_as

You can use this system property to change the data format that is used for saving queried data. The possible values are parquet (default), csv, json, and orc. The following is an example of how you can set this system property in the config.toml file: jdbc_app_jvm_args="-Xmx4g -Ddai.connectors.jdbc.save_as=orc"

dai.connectors.jdbc.driver.isolate_classpath

Do not set this system property unless an error occurs.

By default, the JDBC connector loads the given JDBC driver JAR in an isolated manner to eliminate Java class conflicts. Some JDBC drivers (especially Apache Spark-based JDBC drivers) don’t work well with this approach and give java.lang.ClassNotFoundException errors for classes that are present in the JDBC driver JAR. In such cases, you can turn off the isolation behavior by setting the dai.connectors.jdbc.driver.isolate_classpath Java system property to false:

jdbc_app_jvm_args="-Xmx4g -Ddai.connectors.jdbc.driver.isolate_classpath=false"

Retrieve the JDBC Driver

  1. Download JDBC Driver JAR files:

Note: Remember to take note of the driver classpath, as it is needed for the configuration steps (for example, org.postgresql.Driver).

  1. Copy the driver JAR to a location that can be mounted into the Docker container.

Note: The folder storing the JDBC jar file must be visible/readable by the dai process user.

Enable the JDBC Connector

This example enables the JDBC connector for PostgresQL. Note that the JDBC connection strings will vary depending on the database that is used.

 nvidia-docker run \
   --pid=host \
   --init \
   --rm \
   --shm-size=2g --cap-add=SYS_NICE --ulimit nofile=131071:131071 --ulimit nproc=16384:16384 \
   --add-host name.node:172.16.2.186 \
   -e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS="file,hdfs,jdbc" \
   -e DRIVERLESS_AI_JDBC_APP_CONFIGS='{"postgres":
                                       {"url": "jdbc:postgres://localhost:5432/my_database",
                                       "jarpath": "/path/to/postgresql/jdbc/driver.jar",
                                       "classpath": "org.postgresql.Driver"}}'  \
   -e DRIVERLESS_AI_JDBC_APP_JVM_ARGS="-Xmx2g" \
   -p 12345:12345 \
   -v /path/to/local/postgresql/jdbc/driver.jar:/path/to/postgresql/jdbc/driver.jar \
   -v /etc/passwd:/etc/passwd:ro \
   -v /etc/group:/etc/group:ro \
   -v /tmp/dtmp/:/tmp \
   -v /tmp/dlog/:/log \
   -v /tmp/dlicense/:/license \
   -v /tmp/ddata/:/data \
   -u $(id -u):$(id -g) \
   h2oai/dai-ubi8-x86_64:1.11.0-cuda11.8.0.xx

Adding Datasets Using JDBC

After the JDBC connector is enabled, you can add datasets by selecting JDBC from the Add Dataset (or Drag and Drop) drop-down menu.

Make JDBC Query
  1. Click on the Add Dataset button on the Datasets page.

  2. Select JDBC from the list that appears.

  3. Click on the Select JDBC Connection button to select a JDBC configuration.

  4. The form will populate with the JDBC Database, URL, Driver, and Jar information. Complete the following remaining fields:

  • JDBC Username: Enter your JDBC username.

  • JDBC Password: Enter your JDBC password. (See the Notes section)

  • Destination Name: Enter a name for the new dataset.

  • (Optional) ID Column Name: Enter a name for the ID column. Specify this field when making large data queries.

Notes:

  • Do not include the password as part of the JDBC URL. Instead, enter the password in the JDBC Password field. The password is entered separately for security purposes.

  • Due to resource sharing within Driverless AI, the JDBC Connector is only allocated a relatively small amount of memory.

  • When making large queries, the ID column is used to partition the data into manageable portions. This ensures that the maximum memory allocation is not exceeded.

  • If a query that is larger than the maximum memory allocation is made without specifying an ID column, the query will not complete successfully.

  1. Write a SQL Query in the format of the database that you want to query. (See the Query Examples section below.) The format will vary depending on the database that is used.

  2. Click the Click to Make Query button to execute the query. The time it takes to complete depends on the size of the data being queried and the network speeds to the database.

On a successful query, you will be returned to the datasets page, and the queried data will be available as a new dataset.

Query Examples

The following are sample configurations and queries for Oracle DB and PostgreSQL:

  1. Configuration:

jdbc_app_configs = """{"oracledb": {"url": "jdbc:oracle:thin:@localhost:1521/oracledatabase", "jarpath": "/home/ubuntu/jdbc-jars/ojdbc8.jar", "classpath": "oracle.jdbc.OracleDriver"}}"""
  1. Sample Query:

  • Select oracledb from the Select JDBC Connection dropdown menu.

  • JDBC Username: oracleuser

  • JDBC Password: oracleuserpassword

  • ID Column Name:

  • Query:

SELECT MIN(ID) AS NEW_ID, EDUCATION, COUNT(EDUCATION) FROM my_oracle_schema.creditcardtrain GROUP BY EDUCATION

Note: Because this query does not specify an ID Column Name, it will only work for small data. However, the NEW_ID column can be used as the ID Column if the query is for larger data.

  1. Click the Click to Make Query button to execute the query.

Adding an Untested JDBC Driver

We encourage you to try out JDBC drivers that are not tested in house.

  1. Download the JDBC jar for your database.

  2. Move your JDBC jar file to a location that DAI can access.

  3. Start the Driverless AI Docker image using the JDBC-specific environment variables.

     nvidia-docker run \
       --pid=host \
       --init \
       --rm \
       --shm-size=2g --cap-add=SYS_NICE --ulimit nofile=131071:131071 --ulimit nproc=16384:16384 \
       --add-host name.node:172.16.2.186 \
       -e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS="upload,file,hdfs,s3,recipe_file,jdbc" \
       -e DRIVERLESS_AI_JDBC_APP_CONFIGS="""{"my_jdbc_database": {"url": "jdbc:my_jdbc_database://hostname:port/database",
                                             "jarpath": "/path/to/my/jdbc/database.jar",
                                             "classpath": "com.my.jdbc.Driver"}}"""\
       -e DRIVERLESS_AI_JDBC_APP_JVM_ARGS="-Xmx2g" \
       -p 12345:12345 \
       -v /path/to/local/postgresql/jdbc/driver.jar:/path/to/postgresql/jdbc/driver.jar \
       -v /etc/passwd:/etc/passwd:ro \
       -v /etc/group:/etc/group:ro \
       -v /tmp/dtmp/:/tmp \
       -v /tmp/dlog/:/log \
       -v /tmp/dlicense/:/license \
       -v /tmp/ddata/:/data \
       -u $(id -u):$(id -g) \
       h2oai/dai-ubi8-x86_64:1.11.0-cuda11.8.0.xx