intercept

  • Available in: GLM, GAM

  • Hyperparameter: no

Description

The intercept command allows you to specify whether a single constant term should to be included in the model. By including a constant term (i.e., an “intercept” or “bias”), we ensure that the model is to be unbiased–i.e., the mean of the residuals will be exactly zero.

The intercept term adjusts all predictions up or down by a constant amount, i.e. it is the predicted value when all inputs are exactly 0. It can be excluded (forced to equal 0) by setting intercept=FALSE.

In GLM and GAM, the inverse of the link function is applied to obtain the final predictions (targets in regression and probabilities in classification). Excluding the intercept term can negatively impact measured model quality, but it is appropriate when a given linear model should definitely predict 0 when all inputs are 0 (before the application of the inverse of the link function to obtain the final predictions). You may also want to exclude the intercept term to choose a simpler model when appropriate and move away from overfitting.

Example

library(h2o)
h2o.init()

# import the iris dataset:
# this dataset is used to classify the type of iris plant
# the original dataset can be found at https://archive.ics.uci.edu/ml/datasets/Iris
iris <- h2o.importFile("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv")

# convert response column to a factor
iris['class'] <- as.factor(iris['class'])

# set the predictor names and the response column name
predictors <- colnames(iris)[-length(iris)]
response <- 'class'

# split into train and validation
iris_splits <- h2o.splitFrame(data = iris, ratios = 0.8)
train <- iris_splits[[1]]
valid <- iris_splits[[2]]

# try using the `intercept` parameter:
iris_glm <- h2o.glm(x = predictors, y = response, family = 'multinomial',
                    intercept = TRUE, training_frame = train, validation_frame = valid)

# print the logloss for the validation data
print(h2o.logloss(iris_glm, valid = TRUE))
import h2o
from h2o.estimators.glm import H2OGeneralizedLinearEstimator
h2o.init()

# import the iris dataset:
# this dataset is used to classify the type of iris plant
# the original dataset can be found at https://archive.ics.uci.edu/ml/datasets/Iris
iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv")

# convert response column to a factor
iris['class'] = iris['class'].asfactor()

# set the predictor names and the response column name
predictors = iris.columns[:-1]
response = 'class'

# split into train and validation sets
train, valid = iris.split_frame(ratios = [.8])

# try using the `intercept` parameter:
# Initialize and train a GLM
iris_glm = H2OGeneralizedLinearEstimator(family = 'multinomial', intercept = True)
iris_glm.train(x = predictors, y = response, training_frame = train, validation_frame = valid)

# print the logloss for the validation data
iris_glm.logloss(valid = True)