Adding datasetsΒΆ

You can add datasets using one of the following methods:

Drag and drop files from your local machine directly onto this page. Note that this method currently works for files that are less than 10 GB.

or

Click the Add Dataset (or Drag & Drop) button to upload or add a dataset.

Notes:

  • Upload File, File System, HDFS, S3, Data Recipe URL, and Upload Data Recipe are enabled by default. These can be disabled by removing them from the enabled_file_systems setting in the config.toml file. (Refer to Using the config.toml file section for more information.)

  • If File System is disabled, Driverless AI will open a local filebrowser by default.

  • If Driverless AI was started with data connectors enabled for Azure Blob Store, BlueData Datatap, Google Big Query, Google Cloud Storage, KDB+, Minio, Snowflake, or JDBC, then these options will appear in the Add Dataset (or Drag & Drop) dropdown menu. Refer to the Enabling Data Connectors section for more information.

  • When specifying to add a dataset using Data Recipe URL, the URL must point to either an HTML or raw version of the file, a GitHub repository or tree, or a local file. When adding or uploading datasets via recipes, the dataset will be saved as a .jay file.

  • Datasets must be in delimited text format.

  • Driverless AI can detect the following separators: ,|;t

  • When importing a folder, the entire folder and all of its contents are read into Driverless AI as a single file.

  • When importing a folder, all of the files in the folder must have the same columns.

  • If you try to import a folder via a data connector on Windows, the import will fail if the folder contains files that do not have file extensions (the resulting error is usually related to the above note).

Upon completion, the datasets will appear in the Datasets Overview page. Click on a dataset to open a submenu. From this menu, you can specify to Rename, view Details of, Visualize, Split, Download, or Delete a dataset. Note: You cannot delete a dataset that was used in an active experiment. You have to delete the experiment first.

Adding Dataset example