``link`` -------- - Available in: GLM, GAM - Hyperparameter: no Description ~~~~~~~~~~~ GLM and GAM problems consist of three main components: - A random component :math:`f` for the dependent variable :math:`y`: The density function :math:`f(y;\theta,\phi)` has a probability distribution from the exponential family parametrized by :math:`\theta` and :math:`\phi`. This removes the restriction on the distribution of the error and allows for non-homogeneity of the variance with respect to the mean vector. - A systematic component (linear model) :math:`\eta`: :math:`\eta = X\beta`, where :math:`X` is the matrix of all observation vectors :math:`x_i`. - A link function :math:`g`: :math:`E(y) = \mu = {g^-1}(\eta)` relates the expected value of the response :math:`\mu` to the linear component :math:`\eta`. The link function can be any monotonic differentiable function. This relaxes the constraints on the additivity of the covariates, and it allows the response to belong to a restricted range of values depending on the chosen transformation :math:`g`. Accordingly, in order to specify a GLM or GAM problem, you must choose a family function :math:`f`, link function :math:`g`, and any parameters needed to train the model. H2O's GLM and GAM support the following link functions: Family_Default, Identity, Logit, Log, Inverse, Tweedie, or Ologit. The following table describes the allowed Family/Link combinations. +---------------------+-------------------------------------------------------------+--------+ | **Family** | **Link Function** | +---------------------+----------------+----------+-------+-----+---------+---------+--------+ | | Family_Default | Identity | Logit | Log | Inverse | Tweedie | Ologit | +---------------------+----------------+----------+-------+-----+---------+---------+--------+ | Binomial | X | | X | | | | | +---------------------+----------------+----------+-------+-----+---------+---------+--------+ | Fractional Binomial | X | | X | | | | | +---------------------+----------------+----------+-------+-----+---------+---------+--------+ | Quasibinomial | X | | X | | | | | +---------------------+----------------+----------+-------+-----+---------+---------+--------+ | Multinomial | X | | | | | | | +---------------------+----------------+----------+-------+-----+---------+---------+--------+ | Ordinal | X | | | | | | X | +---------------------+----------------+----------+-------+-----+---------+---------+--------+ | Gaussian | X | X | | X | X | | | +---------------------+----------------+----------+-------+-----+---------+---------+--------+ | Poisson | X | X | | X | | | | +---------------------+----------------+----------+-------+-----+---------+---------+--------+ | Gamma | X | X | | X | X | | | +---------------------+----------------+----------+-------+-----+---------+---------+--------+ | Tweedie | X | | | | | X | | +---------------------+----------------+----------+-------+-----+---------+---------+--------+ | Negative Binomial | X | X | | X | | | | +---------------------+----------------+----------+-------+-----+---------+---------+--------+ | AUTO | X*** | X* | X** | X* | X* | | | +---------------------+----------------+----------+-------+-----+---------+---------+--------+ For **AUTO**: - X*: the data is numeric (``Real`` or ``Int``) (family determined as ``gaussian``) - X**: the data is ``Enum`` with cardinality = 2 (family determined as ``binomial``) - X***: the data is ``Enum`` with cardinality > 2 (family determined as ``multinomial``) Refer to the `Links <../glm.html#links>`__ section for more information. Related Parameters ~~~~~~~~~~~~~~~~~~ - `family `__ Example ~~~~~~~ .. tabs:: .. code-tab:: r R library(h2o) h2o.init() # import the iris dataset: # this dataset is used to classify the type of iris plant # the original dataset can be found at https://archive.ics.uci.edu/ml/datasets/Iris iris <- h2o.importFile("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv") # convert response column to a factor iris['class'] <- as.factor(iris['class']) # set the predictor names and the response column name predictors <- colnames(iris)[-length(iris)] response <- 'class' # split into train and validation iris_splits <- h2o.splitFrame(data = iris, ratios = 0.8) train <- iris_splits[[1]] valid <- iris_splits[[2]] # try using the `link` parameter: iris_glm <- h2o.glm(x = predictors, y = response, family = 'multinomial', link = 'family_default', training_frame = train, validation_frame = valid) # print the logloss for the validation data print(h2o.logloss(iris_glm, valid = TRUE)) .. code-tab:: python import h2o from h2o.estimators.glm import H2OGeneralizedLinearEstimator h2o.init() # import the iris dataset: # this dataset is used to classify the type of iris plant # the original dataset can be found at https://archive.ics.uci.edu/ml/datasets/Iris iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv") # convert response column to a factor iris['class'] = iris['class'].asfactor() # set the predictor names and the response column name predictors = iris.columns[:-1] response = 'class' # split into train and validation sets train, valid = iris.split_frame(ratios = [.8]) # try using the `link` parameter: # Initialize and train a GLM iris_glm = H2OGeneralizedLinearEstimator(family = 'multinomial', link = 'family_default') iris_glm.train(x = predictors, y = response, training_frame = train, validation_frame = valid) # print the logloss for the validation data iris_glm.logloss(valid = True)