Data Connectors
Driverless AI provides a number of data connectors for accessing external data sources. The following data connection types are enabled by default:
upload
: Standard upload feature in Driverless AI.file
: Local or server file system.hdfs
: Hadoop file system. Ensure that the HDFS config folder path and keytab are configured.s3
: Amazon S3. Optionally configure secret and access key.recipe_file
: Custom recipe file upload.recipe_url
: Custom recipe upload via URL.
Additionally, the following connections types can be enabled by modifying the enabled_file_systems
configuration option (native installs) or the environment variable (Docker image installs):
dtap
: Blue Data Tap file system. Ensure that the DTap section is configured.gcs
: Google Cloud Storage. Ensure thatgcs_path_to_service_account_json
is configured.gbq
: Google Big Query. Ensure thatgcs_path_to_service_account_json
is configured.hive
: Hive Connector. Ensure that Hive is configured.minio
: Minio Cloud Storage. Ensure that thesecret and access key
are configured.snow
: Snowflake Data Warehouse. Ensure that Snowflake credentials are configured.kdb
: KDB+ Time Series Database. Ensure that KDB credentials are configured.azrbs
: Azure Blob Storage. Ensure that Azure credentials are configured.jdbc
: JDBC Connector. Ensure that JDBC is configured.h2o_drive
: H2O Drive. Ensure thath2o_drive_endpoint_url
is configured.feature_store
: Feature Store. Ensure thatfeature_store_endpoint_url
is configured.databricks
: Databricks. Ensure that the Databrickscluster URL and authentication token
are configured.
These data sources are exposed in the form of the file systems, each prefixed by a unique identifier. For example:
To reference data on S3, use
s3://
.To reference data on HDFS, use the prefix
hdfs://
.To reference data on Azure Blob Storage, use
https://<storage_name>.blob.core.windows.net
.To reference data on BlueData Datatap, use
dtap://
.To reference data on Google BigQuery, ensure you know the Google BigQuery dataset and table you want to query. Use a standard SQL query to ingest data.
To reference data on Google Cloud Storage, use
gs://
.To reference data on kdb+, use the hostname and port
http://<kdb_server>:<port>
.To reference data on MinIO, use
http://<endpoint_url>
.To reference data on Snowflake, use a standard SQL query to ingest data.
To access a SQL database via JDBC, use a SQL query with the syntax appropriate for your database.
To reference data on Databricks, use a standard SQL query to ingest data.
Refer to the following sections for more information: