Enabling Data Connectors

Driverless AI provides a number of data connectors for accessing external data sources. The following data connection types are enabled by default:

  • upload: standard upload feature

  • file: local file system/server file system

  • hdfs: Hadoop file system, remember to configure the HDFS config folder path and keytab

  • s3: Amazon S3, optionally configure secret and access key

  • recipe_file: Custom recipe file upload

  • recipe_url: Custom recipe upload via url

Additionally, the following connections types can be enabled by modifying the enabled_file_systems configuration option (Native installs) or environment variable (Docker image installs):

  • dtap: Blue Data Tap file system, remember to configure the DTap section

  • gcs: Google Cloud Storage, remember to configure gcs_path_to_service_account_json below

  • gbq: Google Big Query, remember to configure gcs_path_to_service_account_json below

  • minio: Minio Cloud Storage, remember to configure secret and access key below

  • snow: Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password)

  • kdb: KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args)

  • azrbs: Azure Blob Storage, remember to configure Azure credentials below (account name, account key)

  • jdbc: JDBC Connector, remember to configure JDBC below. (jdbc_app_configs)

  • hive: Hive Connector, remember to configure Hive below. (hive_app_configs)

These data sources are exposed in the form of the file systems, and each file system is prefixed by a unique prefix. For example:

  • To reference data on S3, use s3://.

  • To reference data on HDFS, use the prefix hdfs://.

  • To reference data on Azure Blob Store, use https://<storage_name>.blob.core.windows.net.

  • To reference data on BlueData Datatap, use dtap://.

  • To reference data on Google BigQuery, make sure you know the Google BigQuery dataset and the table that you want to query. Use a standard SQL query to ingest data.

  • To reference data on Google Cloud Storage, use gs://

  • To reference data on kdb+, use the hostname and the port http://<kdb_server>:<port>

  • To reference data on Minio, use http://<endpoint_url>.

  • To reference data on Snowflake, use a standard SQL query to ingest data.

  • To access a SQL database via JDBC, use a SQL query with the syntax associated with your database.

Refer to the following sections for more information: