auuc_type
¶
Available in: Uplift DRF
Hyperparameter: no
Description¶
Use this option to specify the calculation of the Area Under Uplift Curve (AUUC) metric.
To calculate AUUC for big data, the predictions are binned to histograms. Because of this feature, the results should be different compared to exact computation.
To define AUUC, binned predictions are sorted from largest to smallest value. For every group the cumulative sum of observations statistic is calculated. The resulting cumulative uplift is defined based on these statistics.
The statistics of every group are:
\(T\) how many observations are in the treatment group (how many data rows in the bin have
treatment_column
label == 1)\(C\) how many observations are in the control group (how many data rows in the bin have
treatment_column
label == 0)\(TY1\) how many observations are in the treatment group and respond to the offer (how many data rows in the bin have
treatment_column
label == 1 andresponse_column
label == 1)\(CY1\) how many observations are in the control group and respond to the offer (how many data rows in the bin have
treatment_column
label == 0 andresponse_column
label == 1)
You can set the auuc_type
metric for each bin be computed as:
Qini (
auuc_type="qini"
) \(TY1 - CY1 * \frac{T}{C}\)Lift (
auuc_type="lift"
) \(\frac{TY1}{T} - \frac{CY1}{C}\)Gain (
auuc_type="gain"
) \((\frac{TY1}{T} - \frac{CY1}{C}) * (T + C)\)
Example¶
library(h2o)
h2o.init()
# Import the uplift dataset into H2O:
data <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv")
# Set the predictors, response, and treatment column:
# set the predictors
predictors <- c("f1", "f2", "f3", "f4", "f5", "f6","f7", "f8")
# set the response as a factor
data$conversion <- as.factor(data$conversion)
# set the treatment column as a factor
data$treatment <- as.factor(data$treatment)
# Split the dataset into a train and valid set:
data_split <- h2o.splitFrame(data = data, ratios = 0.8, seed = 1234)
train <- data_split[[1]]
valid <- data_split[[2]]
# Build and train the model:
uplift.model <- h2o.upliftRandomForest(training_frame = train,
validation_frame=valid,
x=predictors,
y="conversion",
ntrees=10,
max_depth=5,
treatment_column="treatment",
uplift_metric="KL",
min_rows=10,
nbins=1000,
seed=1234,
auuc_type="qini")
# Eval performance:
perf <- h2o.performance(uplift.model)
# Get AUUC:
auuc <- h2o.auuc(perf)
import h2o
from h2o.estimators import H2OUpliftRandomForestEstimator
h2o.init()
# Import the cars dataset into H2O:
data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv")
# Set the predictors, response, and treatment column:
predictors = ["f1", "f2", "f3", "f4", "f5", "f6","f7", "f8"]
# set the response as a factor
response = "conversion"
data[response] = data[response].asfactor()
# set the treatment as a factor
treatment_column = "treatment"
data[treatment_column] = data[treatment_column].asfactor()
# Split the dataset into a train and valid set:
train, valid = data.split_frame(ratios=[.8], seed=1234)
# Build and train the model:
uplift_model = H2OUpliftRandomForestEstimator(ntrees=10,
max_depth=5,
treatment_column=treatment_column,
uplift_metric="KL",
min_rows=10,
nbins=1000,
seed=1234,
auuc_type="gain")
uplift_model.train(x=predictors,
y=response,
training_frame=train,
validation_frame=valid)
# Eval performance:
perf = uplift_model.model_performance()
# Get AUUC:
auuc = perf.auuc()