3.40.0.2

Recent Changes

  • Change Log
  • H2O-3 Major Release Blogs

User Guide

  • Welcome to H2O 3
  • API-Related Changes
  • Quick Start Videos
  • Cloud Integration
  • Downloading & Installing H2O
  • Starting H2O
  • H2O Clients
  • Getting Data into Your H2O Cluster
  • Data Manipulation
  • Algorithms
  • Training Models
  • Cross-Validation
  • Variable Importance
  • Grid (Hyperparameter) Search
  • Checkpointing Models
  • Performance and Prediction
  • AutoML: Automatic Machine Learning
  • Model Explainability
  • Admissible Machine Learning
  • Saving, Loading, Downloading, and Uploading Models
  • Productionizing H2O
  • Using Flow - H2O’s Web UI
  • Downloading Logs
  • H2O Architecture
  • Security
  • FAQ
  • Glossary
  • Appendix A - Parameters
    • alpha
    • auc_type
    • auuc_nbins
    • auuc_type
    • balance_classes
    • base_models
    • beta_constraints
    • beta_epsilon
    • binomial_double_trees
    • blending
    • blending_frame
    • build_tree_one_node
    • calibrate_model
    • calibration_frame
    • categorical_encoding
    • check_constant_response
    • checkpoint
    • class_sampling_factors
    • cluster_size_constraints
    • col_sample_rate
    • col_sample_rate_change_per_level
    • col_sample_rate_per_tree
    • compute_metrics
    • compute_p_values
    • custom_distribution_func
    • custom_metric_func
    • data_leakage_handling
    • distribution
    • early_stopping
    • eps_prob
    • eps_sdev
    • estimate_k
    • exclude_algos
    • export_checkpoints_dir
    • extension_level
    • family
    • fold_assignment
    • fold_column
    • gainslift_bins
    • gradient_epsilon
    • HGLM
    • histogram_type
    • huber_alpha
    • ignore_const_cols
    • ignored_columns
    • impute_missing
    • in_training_checkpoints_dir
    • in_training_checkpoints_tree_interval
    • include_algos
    • inflection_point
    • init (GLRM, K-Means)
    • init (CoxPH)
    • interaction_constraints
    • interaction_pairs
    • interactions
    • intercept
    • k
    • keep_cross_validation_fold_assignment
    • keep_cross_validation_models
    • keep_cross_validation_predictions
    • lambda
    • lambda_min_ratio
    • lambda_search
    • laplace
    • learn_rate
    • learn_rate_annealing
    • link
      • Description
      • Related Parameters
      • Example
    • lre_min
    • max_abs_leafnode_pred
    • max_active_predictors
    • max_after_balance_size
    • max_depth
    • max_iterations
    • max_models
    • max_runtime_secs
    • max_runtime_secs_per_model
    • metalearner_algorithm
    • metalearner_params
    • metalearner_transform
    • min_prob
    • min_rows
    • min_sdev
    • min_split_improvement
    • missing_values_handling
    • model_id
    • monotone_constraints
    • mtries
    • nbins
    • nbins_cats
    • nbins_top_level
    • nfolds
    • nlambdas
    • noise
    • non_negative
    • ntrees
    • objective_epsilon
    • offset_column
    • out_of_bounds
    • pca_impl
    • pca_method
    • plug_values
    • pred_noise_bandwidth
    • prior
    • quantile_alpha
    • rand_family
    • random_columns
    • rate
    • rate_annealing
    • rate_decay
    • remove_collinear_columns
    • sample_rate
    • sample_rate_per_class
    • sample_size
    • score_each_iteration
    • score_tree_interval
    • seed
    • single_node_mode
    • smoothing
    • solver
    • sort_metric
    • standardize
    • start_column
    • stop_column
    • stopping_metric
    • stopping_rounds
    • stopping_tolerance
    • stratify_by
    • theta
    • ties
    • training_frame
    • transform
    • treatment_column
    • tweedie_link_power
    • tweedie_power
    • tweedie_variance_power
    • uplift_metric
    • upload_custom_distribution
    • upload_custom_metric
    • use_all_factor_levels
    • user_points
    • validation_frame
    • weights_column
    • x
    • y
  • Appendix B - API Reference
  • Additional Resources
H2O
  • Docs »
  • Appendix A - Parameters »
  • link
  • Edit on GitHub

link¶

  • Available in: GLM, GAM

  • Hyperparameter: no

Description¶

GLM and GAM problems consist of three main components:

  • A random component \(f\) for the dependent variable \(y\): The density function \(f(y;\theta,\phi)\) has a probability distribution from the exponential family parametrized by \(\theta\) and \(\phi\). This removes the restriction on the distribution of the error and allows for non-homogeneity of the variance with respect to the mean vector.

  • A systematic component (linear model) \(\eta\): \(\eta = X\beta\), where \(X\) is the matrix of all observation vectors \(x_i\).

  • A link function \(g\): \(E(y) = \mu = {g^-1}(\eta)\) relates the expected value of the response \(\mu\) to the linear component \(\eta\). The link function can be any monotonic differentiable function. This relaxes the constraints on the additivity of the covariates, and it allows the response to belong to a restricted range of values depending on the chosen transformation \(g\).

Accordingly, in order to specify a GLM or GAM problem, you must choose a family function \(f\), link function \(g\), and any parameters needed to train the model.

H2O’s GLM and GAM support the following link functions: Family_Default, Identity, Logit, Log, Inverse, Tweedie, or Ologit.

The following table describes the allowed Family/Link combinations.

Family

Link Function

Family_Default

Identity

Logit

Log

Inverse

Tweedie

Ologit

Binomial

X

X

Fractional Binomial

X

X

Quasibinomial

X

X

Multinomial

X

Ordinal

X

X

Gaussian

X

X

X

X

Poisson

X

X

X

Gamma

X

X

X

X

Tweedie

X

X

Negative Binomial

X

X

X

AUTO

X***

X*

X**

X*

X*

For AUTO:

  • X*: the data is numeric (Real or Int) (family determined as gaussian)

  • X**: the data is Enum with cardinality = 2 (family determined as binomial)

  • X***: the data is Enum with cardinality > 2 (family determined as multinomial)

Refer to the Links section for more information.

Related Parameters¶

  • family

Example¶

R

Python

library(h2o)
h2o.init()

# import the iris dataset:
# this dataset is used to classify the type of iris plant
# the original dataset can be found at https://archive.ics.uci.edu/ml/datasets/Iris
iris <- h2o.importFile("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv")

# convert response column to a factor
iris['class'] <- as.factor(iris['class'])

# set the predictor names and the response column name
predictors <- colnames(iris)[-length(iris)]
response <- 'class'

# split into train and validation
iris_splits <- h2o.splitFrame(data = iris, ratios = 0.8)
train <- iris_splits[[1]]
valid <- iris_splits[[2]]

# try using the `link` parameter:
iris_glm <- h2o.glm(x = predictors, y = response, family = 'multinomial', link = 'family_default',
                   training_frame = train, validation_frame = valid)

# print the logloss for the validation data
print(h2o.logloss(iris_glm, valid = TRUE))
import h2o
from h2o.estimators.glm import H2OGeneralizedLinearEstimator
h2o.init()

# import the iris dataset:
# this dataset is used to classify the type of iris plant
# the original dataset can be found at https://archive.ics.uci.edu/ml/datasets/Iris
iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv")

# convert response column to a factor
iris['class'] = iris['class'].asfactor()

# set the predictor names and the response column name
predictors = iris.columns[:-1]
response = 'class'

# split into train and validation sets
train, valid = iris.split_frame(ratios = [.8])

# try using the `link` parameter:
# Initialize and train a GLM
iris_glm = H2OGeneralizedLinearEstimator(family = 'multinomial', link = 'family_default')
iris_glm.train(x = predictors, y = response, training_frame = train, validation_frame = valid)

# print the logloss for the validation data
iris_glm.logloss(valid = True)
Next Previous

© Copyright 2016-2022 H2O.ai. Last updated on Mar 09, 2023.

Built with Sphinx using a theme provided by Read the Docs.