dai.split_dataset.Rd
Splits a data-set into two parts, optionally taking into account time or the target or fold variable. Only at most one of the three can be specified.
dai.split_dataset( dataset, output_name1, output_name2, ratio, seed = 1234, target = NULL, fold_col = NULL, time_col = NULL, split_datetime = NULL, progress = getOption("dai.progress", TRUE) )
dataset | DAIFrame representing the data-set to be split. |
---|---|
output_name1 | The name of the first part. |
output_name2 | The name of the second part. |
ratio | A ratio of the two new parts within (0, 1) interval. |
seed | Random number generator's seed. |
target | Target column for stratified sampling (optional). |
fold_col | Fold column to keep rows belonging to the same group together (optional). |
time_col | Time column (optional). |
split_datetime | Datetime string (taken from original raw time column) that defines the start of the test set (optional, instead of ratio). |
progress | Whether to display a progress bar. |
A list containing the two DAIFrames under name output_name1
and output_name2
.
dai.connect(uri = 'http://127.0.0.1:12345', username = 'h2oai', password = 'h2oai') iris_dai <- as.DAIFrame(iris, progress = FALSE) iris_splits <- dai.split_dataset(iris_dai, 'train', 'test', 0.8, progress = FALSE)