Splits a data-set into two parts, optionally taking into account time or the target or fold variable. Only at most one of the three can be specified.

dai.split_dataset(
  dataset,
  output_name1,
  output_name2,
  ratio,
  seed = 1234,
  target = NULL,
  fold_col = NULL,
  time_col = NULL,
  split_datetime = NULL,
  progress = getOption("dai.progress", TRUE)
)

Arguments

dataset

DAIFrame representing the data-set to be split.

output_name1

The name of the first part.

output_name2

The name of the second part.

ratio

A ratio of the two new parts within (0, 1) interval.

seed

Random number generator's seed.

target

Target column for stratified sampling (optional).

fold_col

Fold column to keep rows belonging to the same group together (optional).

time_col

Time column (optional).

split_datetime

Datetime string (taken from original raw time column) that defines the start of the test set (optional, instead of ratio).

progress

Whether to display a progress bar.

Value

A list containing the two DAIFrames under name output_name1 and output_name2.

Examples

dai.connect(uri = 'http://127.0.0.1:12345', username = 'h2oai', password = 'h2oai')
iris_dai <- as.DAIFrame(iris, progress = FALSE)
iris_splits <- dai.split_dataset(iris_dai, 'train', 'test', 0.8, progress = FALSE)