This is an API for a new target encoding implemented in JAVA. Applies a target encoding map to an H2OFrame object. Computing target encoding for high cardinality categorical columns can improve performance of supervised learning models.
h2o.target_encode_transform(frame, x, y, target_encode_map, holdout_type, fold_column = NULL, blended_avg = TRUE, inflection_point = 10, smoothing = 20, noise = -1, seed = -1)
An H2OFrame object with which to apply the target encoding map.
List of categorical column names or indices that we want apply target encoding to. Case when item in the list is a list of multiple columns itself is not supported for now.
The name or column index of the response variable in the frame.
An object that is a result of the calling
1) "kfold" - encodings for a fold are generated based on out-of-fold data.
2) "loo" - leave one out. Current row's response value is subtracted from the pre-calculated per-level frequencies.
3) "none" - we do not holdout anything. Using whole frame for training
(Optional) The name or column index of the fold column in the frame.
(Optional) Parameter for blending. Used to calculate `lambda`. Determines half of the minimal sample size for which we completely trust the estimate based on the sample in the particular level of categorical variable. Default value is 10.
(Optional) Parameter for blending. Used to calculate `lambda`. Controls the rate of transition between the particular level's posterior probability and the prior probability. For smoothing values approaching infinity it becomes a hard threshold between the posterior and the prior probability. Default value is 20.
(Optional) The amount of random noise added to the target encoding. This helps prevent overfitting. Defaults to 0.01 * range of y.
(Optional) A random seed used to generate draws from the uniform distribution for random noise. Defaults to -1.
Returns an H2OFrame object containing the target encoding per record.
h2o.target_encode_fit for creating the target encoding map