Trains an Extended Isolation Forest model

h2o.extendedIsolationForest(
  training_frame,
  x,
  model_id = NULL,
  ignore_const_cols = TRUE,
  categorical_encoding = c("AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary",
    "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited"),
  score_each_iteration = FALSE,
  score_tree_interval = 0,
  ntrees = 100,
  sample_size = 256,
  extension_level = 0,
  seed = -1,
  disable_training_metrics = TRUE
)

Arguments

training_frame	Id of the training data frame.
x	A vector containing the `character` names of the predictors in the model.
model_id	Destination id for this model; auto-generated if not specified.
ignore_const_cols	`Logical`. Ignore constant columns. Defaults to TRUE.
categorical_encoding	Encoding scheme for categorical features Must be one of: "AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited". Defaults to AUTO.
score_each_iteration	`Logical`. Whether to score during each iteration of model training. Defaults to FALSE.
score_tree_interval	Score the model after every so many trees. Disabled if set to 0. Defaults to 0.
ntrees	Number of Extended Isolation Forest trees. Defaults to 100.
sample_size	Number of randomly sampled observations used to train each Extended Isolation Forest tree. Defaults to 256.
extension_level	Maximum is N - 1 (N = numCols). Minimum is 0. Extended Isolation Forest with extension_Level = 0 behaves like Isolation Forest. Defaults to 0.
seed	Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default). Defaults to -1 (time-based random number).
disable_training_metrics	`Logical`. Disable calculating training metrics (expensive on large datasets) Defaults to TRUE.

Examples

if (FALSE) {
library(h2o)
h2o.init()

# Import the prostate dataset
p <- h2o.importFile(path="https://raw.github.com/h2oai/h2o/master/smalldata/logreg/prostate.csv")

# Set the predictors
predictors <- c("AGE","RACE","DPROS","DCAPS","PSA","VOL","GLEASON")

# Build an Extended Isolation forest model
model <- h2o.extendedIsolationForest(x = predictors,
                                     training_frame = p,
                                     model_id = "eif.hex",
                                     ntrees = 100,
                                     sample_size = 256,
                                     extension_level = length(predictors) - 1)

# Calculate score
score <- h2o.predict(model, p)
anomaly_score <- score$anomaly_score

# Number in [0, 1] explicitly defined in Equation (1) from Extended Isolation Forest paper
# or in paragraph '2 Isolation and Isolation Trees' of Isolation Forest paper
anomaly_score <- score$anomaly_score

# Average path length of the point in Isolation Trees from root to the leaf
mean_length <- score$mean_length
}