Builds a Decision Tree model on an H2OFrame.

h2o.decision_tree(
  x,
  y,
  training_frame,
  model_id = NULL,
  ignore_const_cols = TRUE,
  categorical_encoding = c("AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary",
    "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited"),
  seed = -1,
  max_depth = 20,
  min_rows = 10
)

Arguments

x

(Optional) A vector containing the names or indices of the predictor variables to use in building the model. If x is missing, then all columns except y are used.

y

The name or column index of the response variable in the data. The response must be either a numeric or a categorical/factor variable. If the response is numeric, then a regression model will be trained, otherwise it will train a classification model.

training_frame

Id of the training data frame.

model_id

Destination id for this model; auto-generated if not specified.

ignore_const_cols

Logical. Ignore constant columns. Defaults to TRUE.

categorical_encoding

Encoding scheme for categorical features Must be one of: "AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited". Defaults to AUTO.

seed

Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default). Defaults to -1 (time-based random number).

max_depth

Max depth of tree. Defaults to 20.

min_rows

Fewest allowed (weighted) observations in a leaf. Defaults to 10.

Value

Creates a H2OModel object of the right type.

See also

predict.H2OModel for prediction

Examples

if (FALSE) {
library(h2o)
h2o.init()

# Import the airlines dataset
f <- "https://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv"
data <- h2o.importFile(f)

# Set predictors and response; set response as a factor
data["CAPSULE"] <- as.factor(data["CAPSULE"])
predictors <- c("AGE","RACE","DPROS","DCAPS","PSA","VOL","GLEASON")
response <- "CAPSULE"

# Train the DT model
h2o_dt <- h2o.decision_tree(x = predictors, y = response, training_frame = data, seed = 1234)
}