Creates a dataset from a file located at the Driverless AI host machine. See supported file formats.

dai.create_dataset(
  path,
  progress = getOption("dai.progress", TRUE),
  type = "auto"
)

Arguments

path

Path to the file at the Driverless AI host machine.

progress

Whether to display a progress bar.

type

The data source type - see Details.

Value

DAIFrame representing the dataset.

Details

The data source type parameter can be set to any of the following values:

  • auto: Determine the type from the path. Fails if ambiguous.

  • file: server's file-system

  • s3: Amazon s3

  • hdfs: Hadoop File System

  • dtap: BlueData Datatap

  • gs: Google Cloud Storage

  • azr: Azure Blob Store

  • minio: Minio

When set to auto, then the type will be inferred from the path. This inferrence is not possible for Minio, so for that source you need to set type = "minio".

See also

Examples

dai.connect(uri = 'http://127.0.0.1:12345', username = 'h2oai', password = 'h2oai')
# \dontshow{
abs_path_to_data_on_server <- dai:::find_file('smalldata/kaggle/CreditCard/creditcard_train_cat.csv')
# }
data <- dai.create_dataset(abs_path_to_data_on_server, progress = FALSE)
# \dontshow{
dai.rm(data)
# }
# \donttest{
# Create a dataset from S3
ccard <- dai.create_dataset("s3://h2o-training/events/ibm_index/CreditCard_Cat-test.csv")
# Create a dataset from Minio
# The type parameter is required because the data source type cannot be inferred from the path
ccard <- dai.create_dataset("http://minio-server:9000/h2oaidev/cc_train.csv", type = "minio")
# }