Trains an Isolation Forest model
h2o.isolationForest( training_frame, x, model_id = NULL, score_each_iteration = FALSE, score_tree_interval = 0, ignore_const_cols = TRUE, ntrees = 50, max_depth = 8, min_rows = 1, max_runtime_secs = 0, seed = 1, build_tree_one_node = FALSE, mtries = 1, sample_size = 256, sample_rate = 1, col_sample_rate_change_per_level = 1, col_sample_rate_per_tree = 1, categorical_encoding = c("AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited"), stopping_rounds = 0, stopping_metric = c("AUTO", "anomaly_score"), stopping_tolerance = 0.01, export_checkpoints_dir = NULL )
training_frame  Id of the training data frame. 

x  A vector containing the 
model_id  Destination id for this model; autogenerated if not specified. 
score_each_iteration 

score_tree_interval  Score the model after every so many trees. Disabled if set to 0. Defaults to 0. 
ignore_const_cols 

ntrees  Number of trees. Defaults to 50. 
max_depth  Maximum tree depth. Defaults to 8. 
min_rows  Fewest allowed (weighted) observations in a leaf. Defaults to 1. 
max_runtime_secs  Maximum allowed runtime in seconds for model training. Use 0 to disable. Defaults to 0. 
seed  Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default). Defaults to 1 (timebased random number). 
build_tree_one_node 

mtries  Number of variables randomly sampled as candidates at each split. If set to 1, defaults (number of predictors)/3. Defaults to 1. 
sample_size  Number of randomly sampled observations used to train each Isolation Forest tree. Only one of parameters sample_size and sample_rate should be defined. If sample_rate is defined, sample_size will be ignored. Defaults to 256. 
sample_rate  Rate of randomly sampled observations used to train each Isolation Forest tree. Needs to be in range from 0.0 to 1.0. If set to 1, sample_rate is disabled and sample_size will be used instead. Defaults to 1. 
col_sample_rate_change_per_level  Relative change of the column sampling rate for every level (must be > 0.0 and <= 2.0) Defaults to 1. 
col_sample_rate_per_tree  Column sample rate per tree (from 0.0 to 1.0) Defaults to 1. 
categorical_encoding  Encoding scheme for categorical features Must be one of: "AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited". Defaults to AUTO. 
stopping_rounds  Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable) Defaults to 0. 
stopping_metric  Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anonomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. Must be one of: "AUTO", "anomaly_score". Defaults to AUTO. 
stopping_tolerance  Relative tolerance for metricbased stopping criterion (stop if relative improvement is not at least this much) Defaults to 0.01. 
export_checkpoints_dir  Automatically export generated models to this directory. 
# NOT RUN { library(h2o) h2o.init() # Import the cars dataset f < "https://s3.amazonaws.com/h2opublictestdata/smalldata/junit/cars_20mpg.csv" cars < h2o.importFile(f) # Set the predictors predictors < c("displacement","power","weight","acceleration","year") # Train the IF model cars_if < h2o.isolationForest(x = predictors, training_frame = cars, seed = 1234, stopping_metric = "MSE", stopping_rounds = 3, stopping_tolerance = 0.1) # }