``link``
--------

- Available in: GLM,  GAM
- Hyperparameter: no

Description
~~~~~~~~~~~

GLM and GAM problems consist of three main components:

- A random component :math:`f` for the dependent variable :math:`y`: The density function :math:`f(y;\theta,\phi)` has a probability distribution from the exponential family parametrized by :math:`\theta` and :math:`\phi`. This removes the restriction on the distribution of the error and allows for non-homogeneity of the variance with respect to the mean vector. 
- A systematic component (linear model) :math:`\eta`: :math:`\eta = X\beta`, where :math:`X` is the matrix of all observation vectors :math:`x_i`.
- A link function :math:`g`: :math:`E(y) = \mu = {g^-1}(\eta)` relates the expected value of the response :math:`\mu` to the linear component :math:`\eta`. The link function can be any monotonic differentiable function. This relaxes the constraints on the additivity of the covariates, and it allows the response to belong to a restricted range of values depending on the chosen transformation :math:`g`. 

Accordingly, in order to specify a GLM or GAM problem, you must choose a family function :math:`f`, link function :math:`g`, and any parameters needed to train the model. 

H2O's GLM and GAM support the following link functions: Family_Default, Identity, Logit, Log, Inverse, Tweedie, or Ologit.

The following table describes the allowed Family/Link combinations.

+---------------------+-------------------------------------------------------------+--------+
| **Family**          | **Link Function**                                                    |
+---------------------+----------------+----------+-------+-----+---------+---------+--------+
|                     | Family_Default | Identity | Logit | Log | Inverse | Tweedie | Ologit |
+---------------------+----------------+----------+-------+-----+---------+---------+--------+
| Binomial            | X              |          | X     |     |         |         |        |
+---------------------+----------------+----------+-------+-----+---------+---------+--------+
| Fractional Binomial | X              |          | X     |     |         |         |        |
+---------------------+----------------+----------+-------+-----+---------+---------+--------+
| Quasibinomial       | X              |          | X     |     |         |         |        |
+---------------------+----------------+----------+-------+-----+---------+---------+--------+
| Multinomial         | X              |          |       |     |         |         |        |
+---------------------+----------------+----------+-------+-----+---------+---------+--------+
| Ordinal             | X              |          |       |     |         |         | X      |
+---------------------+----------------+----------+-------+-----+---------+---------+--------+
| Gaussian            | X              | X        |       | X   | X       |         |        |
+---------------------+----------------+----------+-------+-----+---------+---------+--------+
| Poisson             | X              | X        |       | X   |         |         |        |
+---------------------+----------------+----------+-------+-----+---------+---------+--------+
| Gamma               | X              | X        |       | X   | X       |         |        |
+---------------------+----------------+----------+-------+-----+---------+---------+--------+
| Tweedie             | X              |          |       |     |         | X       |        |
+---------------------+----------------+----------+-------+-----+---------+---------+--------+
| Negative Binomial   | X              | X        |       | X   |         |         |        |
+---------------------+----------------+----------+-------+-----+---------+---------+--------+
| AUTO                | X***           | X*       | X**   | X*  | X*      |         |        |
+---------------------+----------------+----------+-------+-----+---------+---------+--------+

For **AUTO**:

- X*: the data is numeric (``Real`` or ``Int``) (family determined as ``gaussian``)
- X**: the data is ``Enum`` with cardinality = 2 (family determined as ``binomial``)
- X***: the data is ``Enum`` with cardinality > 2 (family determined as ``multinomial``)

Refer to the `Links <../glm.html#links>`__ section for more information. 

Related Parameters
~~~~~~~~~~~~~~~~~~

- `family <family.html>`__

Example
~~~~~~~

.. tabs::
   .. code-tab:: r R

		library(h2o)
		h2o.init()

		# import the iris dataset:
		# this dataset is used to classify the type of iris plant
		# the original dataset can be found at https://archive.ics.uci.edu/ml/datasets/Iris
		iris <- h2o.importFile("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv")

		# convert response column to a factor
		iris['class'] <- as.factor(iris['class'])

		# set the predictor names and the response column name
		predictors <- colnames(iris)[-length(iris)]
		response <- 'class'

		# split into train and validation
		iris_splits <- h2o.splitFrame(data = iris, ratios = 0.8)
		train <- iris_splits[[1]]
		valid <- iris_splits[[2]]

		# try using the `link` parameter:
		iris_glm <- h2o.glm(x = predictors, y = response, family = 'multinomial', link = 'family_default',
		                   training_frame = train, validation_frame = valid)

		# print the logloss for the validation data
		print(h2o.logloss(iris_glm, valid = TRUE))
   
   .. code-tab:: python

		import h2o
		from h2o.estimators.glm import H2OGeneralizedLinearEstimator
		h2o.init()

		# import the iris dataset:
		# this dataset is used to classify the type of iris plant
		# the original dataset can be found at https://archive.ics.uci.edu/ml/datasets/Iris
		iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv")

		# convert response column to a factor
		iris['class'] = iris['class'].asfactor()

		# set the predictor names and the response column name
		predictors = iris.columns[:-1]
		response = 'class'

		# split into train and validation sets
		train, valid = iris.split_frame(ratios = [.8])

		# try using the `link` parameter:
		# Initialize and train a GLM
		iris_glm = H2OGeneralizedLinearEstimator(family = 'multinomial', link = 'family_default')
		iris_glm.train(x = predictors, y = response, training_frame = train, validation_frame = valid)

		# print the logloss for the validation data
		iris_glm.logloss(valid = True)