Creates a target encoding map based on group-by columns (`x`) and a numeric or binary target column (`y`). Computing target encoding for high cardinality categorical columns can improve performance of supervised learning models. A Target Encoding tutorial is available here: https://github.com/h2oai/h2o-tutorials/blob/master/best-practices/categorical-predictors/target_encoding.md.
h2o.target_encode_create(data, x, y, fold_column = NULL)An H2OFrame object with which to create the target encoding map.
A list containing the names or indices of the variables to encode. A target encoding map will be created for each element in the list. Items in the list can be multiple columns. For example, if `x = list(c("A"), c("B", "C"))`, then there will be one mapping frame for A and one mapping frame for B & C (in this case, we group by two columns).
The name or column index of the response variable in the data. The response variable can be either numeric or binary.
(Optional) The name or column index of the fold column in the data. Defaults to NULL (no `fold_column`).
Returns a list of H2OFrame objects containing the target encoding mapping for each column in `x`.
h2o.target_encode_apply for applying the target encoding mapping to a frame.
if (FALSE) { # \dontrun{
library(h2o)
h2o.init()
# Get Target Encoding Map on bank-additional-full data with numeric response
data <- h2o.importFile(
path = "https://s3.amazonaws.com/h2o-public-test-data/smalldata/demos/bank-additional-full.csv")
mapping_age <- h2o.target_encode_create(data = data, x = list(c("job"), c("job", "marital")),
y = "age")
head(mapping_age)
# Get Target Encoding Map on bank-additional-full data with binary response
mapping_y <- h2o.target_encode_create(data = data, x = list(c("job"), c("job", "marital")),
y = "y")
head(mapping_y)
} # }