stratify_by

  • Available in: CoxPH

  • Hyperparameter: no

Description

In a CoxPH model, stratification is useful as a diagnostic for checking the proportional hazards assumption, as it allows for as many different hazard functions as there are strata. For example, when attempting to predict X, you can include a secondary categorical predictor, Z, that can be adjusted for when making inferences about X’s relationship to the time-to-event endpoint.

Use the `stratify_by parameter to specify a list of columns to use for stratification when building a CoxPH model. The stratification column must be present in the x list in the <model_name>.train() call (e.g. if x=["PhoneService", "MultipleLines", "InternetService", "Contract"], then stratify_by must equal one of those columns).

Example

library(h2o)
h2o.init()

# import the heart dataset:
heart <- h2o.importFile("http://s3.amazonaws.com/h2o-public-test-data/smalldata/coxph_test/heart.csv")

# set the predictor and response column:
x <- "age"
y <- "event"

# set the start and stop columns:
start <- "start"
stop <- "stop"

# convert the age column to a factor:
heart["age"] <- as.factor(heart["age"])

# train your model:
heart_coxph <- h2o.coxph(x = c("year", x),
                         event_column = y,
                         start_column = start,
                         stop_column = stop,
                         stratify_by = x,
                         training_frame = heart)

# view the model details:
heart_coxph
Model Details:
==============

H2OCoxPHModel: coxph
Model ID:  CoxPH_model_R_1570209287520_5
Call:
Surv(start, stop, event) ~ year + strata(age)

        coef    exp(coef) se(coef)  z      p
year    4.734   113.717   8973.421  0.001  1

Likelihood ratio test = 1.39  on 1 df, p = 0.239
n = 172, number of events = 75
import h2o
from h2o.estimators import H2OCoxProportionalHazardsEstimator
h2o.init()

# import the heart dataset:
heart = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/coxph_test/heart.csv")

# set the predictor and response column:
x = ["age", "year"]
y = "event"

# convert the age column to a factor:
heart["age"] = heart["age"].ascharacter()
heart["age"] = heart["age"].asfactor()

# build and train your model:
heart_coxph = H2OCoxProportionalHazardsEstimator(start_column="start",
                                                 stop_column="stop",
                                                 ties="breslow",
                                                 stratify_by=["age"])
heart_coxph.train(x=x, y=y, training_frame=heart)

# view the model details:
heart_coxph
Model Details
=============
H2OCoxProportionalHazardsEstimator :  Cox Proportional Hazards
Model Key:  CoxPH_model_python_1604581637715_647

Call:
Surv(start, stop, event) ~ year + strata(age)

Coefficients: CoxPH Coefficients
names    coefficients    exp_coef    exp_neg_coef    se_coef    z_coef
-------  --------------  ----------  --------------  ---------  -----------
year     4.73372         113.717     0.00879373      8973.42    0.000527526

Likelihood ratio test=1.386294
n=172, number of events=75