Data Connectors¶
Driverless AI provides a number of data connectors for accessing external data sources. The following data connection types are enabled by default:
- upload: The standard upload feature of Driverless AI.
- file: Local file system or server file system.
- hdfs: Hadoop file system. Remember to configure the HDFS config folder path and keytab.
- s3: Amazon S3. Optionally configure secret and access key.
- recipe_file: Custom recipe file upload.
- recipe_url: Custom recipe upload via URL.
Additionally, the following connections types can be enabled by modifying the enabled_file_systems configuration option (Native installs) or environment variable (Docker image installs):
- dtap: Blue Data Tap file system, remember to configure the DTap section
- gcs: Google Cloud Storage, remember to configure- gcs_path_to_service_account_json
- gbq: Google Big Query, remember to configure- gcs_path_to_service_account_json
- hive: Hive Connector, remember to configure Hive
- minio: Minio Cloud Storage, remember to configure- secret and access key
- snow: Snowflake Data Warehouse, remember to configure Snowflake credentials
- kdb: KDB+ Time Series Database, remember to configure KDB credentials
- azrbs: Azure Blob Storage, remember to configure Azure credentials
- jdbc: JDBC Connector, remember to configure JDBC
- h2o_drive: H2O Drive, remember to configure- h2o_drive_endpoint_url
- feature_store: Feature Store, remember to configure feature_store_endpoint_url below
These data sources are exposed in the form of the file systems, and each file system is prefixed by a unique prefix. For example:
- To reference data on S3, use - s3://.
- To reference data on HDFS, use the prefix - hdfs://.
- To reference data on Azure Blob Store, use - https://<storage_name>.blob.core.windows.net.
- To reference data on BlueData Datatap, use - dtap://.
- To reference data on Google BigQuery, make sure you know the Google BigQuery dataset and the table that you want to query. Use a standard SQL query to ingest data. 
- To reference data on Google Cloud Storage, use - gs://
- To reference data on kdb+, use the hostname and the port - http://<kdb_server>:<port>
- To reference data on MinIO, use - http://<endpoint_url>.
- To reference data on Snowflake, use a standard SQL query to ingest data. 
- To access a SQL database via JDBC, use a SQL query with the syntax associated with your database. 
Refer to the following sections for more information: