Data Connectors

Driverless AI provides a number of data connectors for accessing external data sources. The following data connection types are enabled by default:

upload: Standard upload feature in Driverless AI.
file: Local or server file system.
hdfs: Hadoop file system. Ensure that the HDFS config folder path and keytab are configured.
s3: Amazon S3. Optionally configure secret and access key.
recipe_file: Custom recipe file upload.
recipe_url: Custom recipe upload via URL.

Additionally, the following connections types can be enabled by modifying the enabled_file_systems configuration option (native installs) or the environment variable (Docker image installs):

dtap: Blue Data Tap file system. Ensure that the DTap section is configured.
gcs: Google Cloud Storage. Ensure that gcs_path_to_service_account_json is configured.
gbq: Google Big Query. Ensure that gcs_path_to_service_account_json is configured.
hive: Hive Connector. Ensure that Hive is configured.
minio: Minio Cloud Storage. Ensure that the secret and access key are configured.
snow: Snowflake Data Warehouse. Ensure that Snowflake credentials are configured.
kdb: KDB+ Time Series Database. Ensure that KDB credentials are configured.
azrbs: Azure Blob Storage. Ensure that Azure credentials are configured.
jdbc: JDBC Connector. Ensure that JDBC is configured.
h2o_drive: H2O Drive. Ensure that h2o_drive_endpoint_url is configured.
feature_store: Feature Store. Ensure that feature_store_endpoint_url is configured.
databricks: Databricks. Ensure that the Databricks cluster URL and authentication token are configured.

These data sources are exposed in the form of the file systems, each prefixed by a unique identifier. For example:

To reference data on S3, use s3://.
To reference data on HDFS, use the prefix hdfs://.
To reference data on Azure Blob Storage, use https://<storage_name>.blob.core.windows.net.
To reference data on BlueData Datatap, use dtap://.
To reference data on Google BigQuery, ensure you know the Google BigQuery dataset and table you want to query. Use a standard SQL query to ingest data.
To reference data on Google Cloud Storage, use gs://.
To reference data on kdb+, use the hostname and port http://<kdb_server>:<port>.
To reference data on MinIO, use http://<endpoint_url>.
To reference data on Snowflake, use a standard SQL query to ingest data.
To access a SQL database via JDBC, use a SQL query with the syntax appropriate for your database.
To reference data on Databricks, use a standard SQL query to ingest data.

Refer to the following sections for more information: