3.46.0.7

Recent Changes

  • Change Log
  • H2O-3 Blogs

User Guide

  • Welcome to H2O-3
  • API-related changes
  • Quick Start Videos
  • Cloud Integration
  • Downloading and installing H2O-3
  • Starting H2O-3
  • H2O-3 clients
  • Getting data into your H2O-3 cluster
  • Data manipulation
  • Algorithms
  • Training Models
  • Cross-Validation
  • Variable Importance
  • Grid (Hyperparameter) Search
  • Checkpointing Models
  • Performance and Prediction
  • H2O AutoML: Automatic Machine Learning
  • Model Explainability
  • Admissible Machine Learning
  • Saving, Loading, Downloading, and Uploading Models
  • Productionizing H2O
  • MOJO Capabilities
  • Using Flow - H2O’s Web UI
  • Downloading Logs
  • H2O Architecture
  • Security
  • FAQ
  • Glossary
  • Appendix A - Parameters
    • alpha
    • auc_type
    • auuc_nbins
    • auuc_type
    • balance_classes
    • base_models
    • beta_constraints
    • beta_epsilon
    • binomial_double_trees
    • blending
    • blending_frame
    • build_tree_one_node
    • calibrate_model
    • calibration_frame
    • categorical_encoding
    • check_constant_response
    • checkpoint
    • class_sampling_factors
    • cluster_size_constraints
    • col_sample_rate
    • col_sample_rate_change_per_level
    • col_sample_rate_per_tree
    • compute_metrics
    • compute_p_values
    • custom_distribution_func
    • custom_metric_func
    • data_leakage_handling
    • distribution
    • early_stopping
    • eps_prob
    • eps_sdev
    • estimate_k
    • exclude_algos
    • export_checkpoints_dir
    • extension_level
    • family
    • fold_assignment
    • fold_column
    • gainslift_bins
    • gradient_epsilon
    • histogram_type
    • huber_alpha
    • ignore_const_cols
    • ignored_columns
    • impute_missing
    • in_training_checkpoints_dir
    • in_training_checkpoints_tree_interval
    • include_algos
    • inflection_point
    • init (GLRM, K-Means)
    • init (CoxPH)
    • interaction_constraints
    • interaction_pairs
    • interactions
    • intercept
    • k
    • keep_cross_validation_fold_assignment
    • keep_cross_validation_models
    • keep_cross_validation_predictions
    • lambda
    • lambda_min_ratio
    • lambda_search
    • laplace
    • learn_rate
    • learn_rate_annealing
    • link
    • lre_min
    • max_abs_leafnode_pred
    • max_active_predictors
    • max_after_balance_size
    • max_depth
    • max_iterations
    • max_models
    • max_runtime_secs
    • max_runtime_secs_per_model
    • metalearner_algorithm
    • metalearner_params
    • metalearner_transform
    • min_prob
    • min_rows
    • min_sdev
    • min_split_improvement
    • missing_values_handling
    • model_id
    • monotone_constraints
    • mtries
    • nbins
    • nbins_cats
    • nbins_top_level
    • nfolds
    • nlambdas
    • noise
    • non_negative
    • ntrees
    • objective_epsilon
    • offset_column
    • out_of_bounds
    • pca_impl
    • pca_method
    • plug_values
    • pred_noise_bandwidth
    • prior
    • quantile_alpha
    • rand_family
    • random_columns
    • rate
    • rate_annealing
    • rate_decay
    • remove_collinear_columns
    • sample_rate
    • sample_rate_per_class
    • sample_size
    • score_each_iteration
    • score_tree_interval
    • seed
    • single_node_mode
    • smoothing
    • solver
    • sort_metric
    • standardize
    • start_column
    • stop_column
    • stopping_metric
    • stopping_rounds
    • stopping_tolerance
    • stratify_by
    • theta
    • ties
    • training_frame
    • transform
    • treatment_column
    • tweedie_link_power
      • Description
      • Related Parameters
      • Example
    • tweedie_power
    • tweedie_variance_power
    • uplift_metric
    • upload_custom_distribution
    • upload_custom_metric
    • use_all_factor_levels
    • user_points
    • validation_frame
    • weights_column
    • x
    • y
  • Appendix B - API Reference
  • Additional Resources
H2O
  • Docs »
  • Appendix A - Parameters »
  • tweedie_link_power
  • Edit on GitHub

tweedie_link_power¶

  • Available in: GLM, GAM

  • Hyperparameter: yes

Description¶

Tweedie distributions are a family of distributions that include gamma, normal, Poisson and their combinations. This distribution is especially useful for modeling positive continuous variables with exact zeros. When family=tweedie, the tweedie_link_power option can be used to specify the power for the tweedie link function. The link functions \(g(\cdot)\) are of the form \(g(\eta) = \eta^{link.power}\).

This option defaults to 1.

The following describes the values that can be specified for this option:

  • A value of 0 specifies a logarithm link (log-link) function. This is typically used for a count of occurrences in a fixed amount of time/space and is defined as X\(\beta = ln(\mu)\)

  • A value of 1 - vpow (1 minus the variance power) specifies a canonical link function.

  • A value of 1 specifies an identity link function. This is typically used for linear-response data and is defined as X\(\beta = \mu\)

  • A value of 2 specifies an inverse link function. This is defined as X\(\beta = \mu^{-2}\)

The following table shows the acceptable relationships between family functions, tweedie variance powers, and tweedie link powers.

Family Function

Tweedie Variance Power

Tweedie Link Power

Poisson

1

0, 1-vpow, 1

Gamma

2

0, 1-vpow, 2

Inverse-Gaussian

3

1, 1-vpow

Related Parameters¶

  • family

  • link

  • tweedie_variance_power

Example¶

R

Python

library(h2o)
h2o.init()

# import the auto dataset:
# this dataset looks at features of motor insurance policies and predicts the aggregate claim loss
# the original dataset can be found at https://cran.r-project.org/web/packages/HDtweedie/HDtweedie.pdf
auto <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/auto.csv")

# set the predictor names and the response column name
predictors <- colnames(auto)[-1]
# The  response is aggregate claim loss (in $1000s)
response <- "y"

# split into train and validation sets
auto_splits <- h2o.splitFrame(data =  auto, ratios = 0.8)
train <- auto_splits[[1]]
valid <- auto_splits[[2]]

# try using the `tweedie_link_power` parameter:
# train your model, where you specify tweedie_link_power
auto_glm <- h2o.glm(x = predictors, y = response, training_frame = train,
                    validation_frame = valid,
                    family = 'tweedie',
                    tweedie_link_power = 1)

# print the mse for validation set
print(h2o.mse(auto_glm, valid = TRUE))

# look at several values of `tweedie_link_power`
# use the tweedie_variance_power (vp) with the tweedie_link_power to create the canonical link function
vp_list = list(0, 1, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2,
           2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3, 5, 7)

# create a dataframe with the tweedie_variance_power, tweedie_link_power, and corresponding mse
model_results <-lapply(vp_list, function(vp) {
  auto_glm_2 <- h2o.glm(x = predictors, y = response, training_frame = train,
                       validation_frame = valid,
                       family = 'tweedie', tweedie_variance_power = vp,
                       tweedie_link_power = 1.0 - vp)
  temp_df <- data.frame(vp, 1.0 - vp, h2o.mse(auto_glm_2, valid = TRUE))
  names(temp_df) <- c("variance_power", "link_power","mse")
  return(temp_df)})
results = do.call('rbind', model_results)

# print results
results[order(results$mse), ]
import pandas as pd
import h2o
from h2o.estimators.glm import H2OGeneralizedLinearEstimator
h2o.init()

# import the auto dataset:
# this dataset looks at features of motor insurance policies and predicts the aggregate claim loss
# the original dataset can be found at https://cran.r-project.org/web/packages/HDtweedie/HDtweedie.pdf
auto = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/auto.csv")

# set the predictor names and the response column name
predictors = auto.names
predictors.remove('y')
# The  response is aggregate claim loss (in $1000s)
response = "y"

# split into train and validation sets
train, valid = auto.split_frame(ratios = [.8])

# try using the `tweedie_link_power` parameter:
# initialize the estimator then train the model
auto_glm = H2OGeneralizedLinearEstimator(family = 'tweedie', tweedie_link_power = 1)
auto_glm.train(x = predictors, y = response, training_frame = train, validation_frame = valid)

# print the mse for the validation data
print(auto_glm.mse(valid=True))

# look at several values of `tweedie_link_power`
# use the tweedie_variance_power (vp) with the tweedie_link_power to create the canonical link function
vp_list = [0, 1, 1.1, 1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,
       2.1, 2.2,2.3,2.4,2.5,2.6,2.7,2.8,2.9,3, 5, 7]

# loop though the values and append values to the list 'results'
results = []
for vp in vp_list:
    auto_glm_2 = H2OGeneralizedLinearEstimator(family = 'tweedie',
                                               tweedie_variance_power = vp,
                                               tweedie_link_power = 1.0 - vp)
    auto_glm_2.train(x = predictors, y = response, training_frame = train, validation_frame = valid)
    results.append((vp, 1-vp, auto_glm_2.mse(valid=True)))

# create a pandas dataframe that has the tweedie_variance_power,tweedie_link_power, and corresponding mse
pd.DataFrame(sorted(results, key=lambda triple: triple[2]), columns=['variance_power', 'link_power', 'mse'])
Next Previous

© Copyright 2016-2024 H2O.ai. Last updated on Mar 27, 2025.

Built with Sphinx using a theme provided by Read the Docs.