`tweedie_link_power`¶

Available in: GLM, GAM
Hyperparameter: yes

Description¶

Tweedie distributions are a family of distributions that include gamma, normal, Poisson and their combinations. This distribution is especially useful for modeling positive continuous variables with exact zeros. When family=tweedie, the tweedie_link_power option can be used to specify the power for the tweedie link function. The link functions \(g(\cdot)\) are of the form \(g(\eta) = \eta^{link.power}\).

This option defaults to 1.

The following describes the values that can be specified for this option:

A value of 0 specifies a logarithm link (log-link) function. This is typically used for a count of occurrences in a fixed amount of time/space and is defined as X\(\beta = ln(\mu)\)
A value of 1 - vpow (1 minus the variance power) specifies a canonical link function.
A value of 1 specifies an identity link function. This is typically used for linear-response data and is defined as X\(\beta = \mu\)
A value of 2 specifies an inverse link function. This is defined as X\(\beta = \mu^{-2}\)

The following table shows the acceptable relationships between family functions, tweedie variance powers, and tweedie link powers.

Family Function	Tweedie Variance Power	Tweedie Link Power
Poisson	1	0, 1-vpow, 1
Gamma	2	0, 1-vpow, 2
Inverse-Gaussian	3	1, 1-vpow

Example¶

library(h2o)
h2o.init()

# import the auto dataset:
# this dataset looks at features of motor insurance policies and predicts the aggregate claim loss
# the original dataset can be found at https://cran.r-project.org/web/packages/HDtweedie/HDtweedie.pdf
auto <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/auto.csv")

# set the predictor names and the response column name
predictors <- colnames(auto)[-1]
# The  response is aggregate claim loss (in $1000s)
response <- "y"

# split into train and validation sets
auto_splits <- h2o.splitFrame(data =  auto, ratios = 0.8)
train <- auto_splits[[1]]
valid <- auto_splits[[2]]

# try using the `tweedie_link_power` parameter:
# train your model, where you specify tweedie_link_power
auto_glm <- h2o.glm(x = predictors, y = response, training_frame = train,
                    validation_frame = valid,
                    family = 'tweedie',
                    tweedie_link_power = 1)

# print the mse for validation set
print(h2o.mse(auto_glm, valid = TRUE))

# look at several values of `tweedie_link_power`
# use the tweedie_variance_power (vp) with the tweedie_link_power to create the canonical link function
vp_list = list(0, 1, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2,
           2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3, 5, 7)

# create a dataframe with the tweedie_variance_power, tweedie_link_power, and corresponding mse
model_results <-lapply(vp_list, function(vp) {
  auto_glm_2 <- h2o.glm(x = predictors, y = response, training_frame = train,
                       validation_frame = valid,
                       family = 'tweedie', tweedie_variance_power = vp,
                       tweedie_link_power = 1.0 - vp)
  temp_df <- data.frame(vp, 1.0 - vp, h2o.mse(auto_glm_2, valid = TRUE))
  names(temp_df) <- c("variance_power", "link_power","mse")
  return(temp_df)})
results = do.call('rbind', model_results)

# print results
results[order(results$mse), ]

import pandas as pd
import h2o
from h2o.estimators.glm import H2OGeneralizedLinearEstimator
h2o.init()

# import the auto dataset:
# this dataset looks at features of motor insurance policies and predicts the aggregate claim loss
# the original dataset can be found at https://cran.r-project.org/web/packages/HDtweedie/HDtweedie.pdf
auto = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/auto.csv")

# set the predictor names and the response column name
predictors = auto.names
predictors.remove('y')
# The  response is aggregate claim loss (in $1000s)
response = "y"

# split into train and validation sets
train, valid = auto.split_frame(ratios = [.8])

# try using the `tweedie_link_power` parameter:
# initialize the estimator then train the model
auto_glm = H2OGeneralizedLinearEstimator(family = 'tweedie', tweedie_link_power = 1)
auto_glm.train(x = predictors, y = response, training_frame = train, validation_frame = valid)

# print the mse for the validation data
print(auto_glm.mse(valid=True))

# look at several values of `tweedie_link_power`
# use the tweedie_variance_power (vp) with the tweedie_link_power to create the canonical link function
vp_list = [0, 1, 1.1, 1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,
       2.1, 2.2,2.3,2.4,2.5,2.6,2.7,2.8,2.9,3, 5, 7]

# loop though the values and append values to the list 'results'
results = []
for vp in vp_list:
    auto_glm_2 = H2OGeneralizedLinearEstimator(family = 'tweedie',
                                               tweedie_variance_power = vp,
                                               tweedie_link_power = 1.0 - vp)
    auto_glm_2.train(x = predictors, y = response, training_frame = train, validation_frame = valid)
    results.append((vp, 1-vp, auto_glm_2.mse(valid=True)))

# create a pandas dataframe that has the tweedie_variance_power,tweedie_link_power, and corresponding mse
pd.DataFrame(sorted(results, key=lambda triple: triple[2]), columns=['variance_power', 'link_power', 'mse'])

tweedie_link_power¶

Description¶

Related Parameters¶

Example¶

`tweedie_link_power`¶