R/extendedisolationforest.R
h2o.extendedIsolationForest.RdTrains an Extended Isolation Forest model
h2o.extendedIsolationForest(
training_frame,
x,
model_id = NULL,
ignore_const_cols = TRUE,
categorical_encoding = c("AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary",
"Eigen", "LabelEncoder", "SortByResponse", "EnumLimited"),
score_each_iteration = FALSE,
score_tree_interval = 0,
ntrees = 100,
sample_size = 256,
extension_level = 0,
seed = -1,
disable_training_metrics = TRUE
)Id of the training data frame.
A vector containing the character names of the predictors in the model.
Destination id for this model; auto-generated if not specified.
Logical. Ignore constant columns. Defaults to TRUE.
Encoding scheme for categorical features Must be one of: "AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited". Defaults to AUTO.
Logical. Whether to score during each iteration of model training. Defaults to FALSE.
Score the model after every so many trees. Disabled if set to 0. Defaults to 0.
Number of Extended Isolation Forest trees. Defaults to 100.
Number of randomly sampled observations used to train each Extended Isolation Forest tree. Defaults to 256.
Maximum is N - 1 (N = numCols). Minimum is 0. Extended Isolation Forest with extension_Level = 0 behaves like Isolation Forest. Defaults to 0.
Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default). Defaults to -1 (time-based random number).
Logical. Disable calculating training metrics (expensive on large datasets) Defaults to TRUE.
if (FALSE) { # \dontrun{
library(h2o)
h2o.init()
# Import the prostate dataset
p <- h2o.importFile(path="https://raw.github.com/h2oai/h2o/master/smalldata/logreg/prostate.csv")
# Set the predictors
predictors <- c("AGE","RACE","DPROS","DCAPS","PSA","VOL","GLEASON")
# Build an Extended Isolation forest model
model <- h2o.extendedIsolationForest(x = predictors,
training_frame = p,
model_id = "eif.hex",
ntrees = 100,
sample_size = 256,
extension_level = length(predictors) - 1)
# Calculate score
score <- h2o.predict(model, p)
anomaly_score <- score$anomaly_score
# Number in [0, 1] explicitly defined in Equation (1) from Extended Isolation Forest paper
# or in paragraph '2 Isolation and Isolation Trees' of Isolation Forest paper
anomaly_score <- score$anomaly_score
# Average path length of the point in Isolation Trees from root to the leaf
mean_length <- score$mean_length
} # }