Available in: GBM
H2O’s GBM provides a number of distribution options that can be specified when building a model. Alternatively, you can use the
upload_custom_distribution function to upload a custom distribution into a running H2O cluster.
The custom distribution is a function that implements the water.udf.CDistributionFunc interface. This interface follows the design of hex.Distribution and contains four methods to support distributed invocation:
link: This method returns the type of link function transformation of the probability of response variable to a continuous scale that is unbounded.
init: This method combines weight, actual response, and offset to compute the numerator and denominator of the initial value. It can return [ weight * (response - offset), weight] by default.
gamma: This method combines weight, actual response, residual, and predicted response to compute numerator and denominator of size of step in terminal node estimate.
gradient: This method computes the (negative half) gradient of deviance function at the predicted value for actual response in one GBM learning step.
Three separate fields must be specified when using this function:
klazz: Represents a custom distribution function that provides the four methods described above.
func_name: Assigns a name with uploaded custom functions. This name corresponds to the name of the key in the distributed key-value store.
func_file: The name of the file to store the function in an uploaded jar file. The source code of the given class is saved into a file that is subsequently zipped, uploaded as a zip-archive, and saved into the distributed key-value store.
func_file must be unique for each uploaded custom distribution.
Note: This option is only supported in the Python client.