Parameters of H2ORuleFit

Affected Classes

  • ai.h2o.sparkling.ml.algos.H2ORuleFit

  • ai.h2o.sparkling.ml.algos.classification.H2ORuleFitClassifier

  • ai.h2o.sparkling.ml.algos.regression.H2ORuleFitRegressor

Parameters

  • Each parameter has also a corresponding getter and setter method. (E.g.: label -> getLabel() , setLabel(...) )

ignoredCols

Names of columns to ignore for training.

Scala default value: null ; Python default value: None

Also available on the trained model.

algorithm

The algorithm to use to generate rules. Possible values are "DRF", "GBM", "AUTO".

Default value: "AUTO"

Also available on the trained model.

aucType

Set default multinomial AUC type. Possible values are "AUTO", "NONE", "MACRO_OVR", "WEIGHTED_OVR", "MACRO_OVO", "WEIGHTED_OVO".

Default value: "AUTO"

Also available on the trained model.

columnsToCategorical

List of columns to convert to categorical before modelling

Scala default value: Array() ; Python default value: []

convertInvalidNumbersToNa

If set to ‘true’, the model converts invalid numbers to NA during making predictions.

Scala default value: false ; Python default value: False

Also available on the trained model.

convertUnknownCategoricalLevelsToNa

If set to ‘true’, the model converts unknown categorical levels to NA during making predictions.

Scala default value: false ; Python default value: False

Also available on the trained model.

dataFrameSerializer

A full name of a serializer used for serialization and deserialization of Spark DataFrames to a JSON value within NullableDataFrameParam.

Default value: "ai.h2o.sparkling.utils.JSONDataFrameSerializer"

Also available on the trained model.

detailedPredictionCol

Column containing additional prediction details, its content depends on the model type.

Default value: "detailed_prediction"

Also available on the trained model.

distribution

Distribution function. Possible values are "AUTO", "bernoulli", "quasibinomial", "modified_huber", "multinomial", "ordinal", "gaussian", "poisson", "gamma", "tweedie", "huber", "laplace", "quantile", "fractionalbinomial", "negativebinomial", "custom".

Default value: "AUTO"

Also available on the trained model.

featuresCols

Name of feature columns

Scala default value: Array() ; Python default value: []

Also available on the trained model.

keepBinaryModels

If set to true, all binary models created during execution of the fit method will be kept in DKV of H2O-3 cluster.

Scala default value: false ; Python default value: False

labelCol

Response variable column.

Default value: "label"

Also available on the trained model.

lambdaValue

Lambda for LASSO regressor.

Scala default value: null ; Python default value: None

Also available on the trained model.

maxCategoricalLevels

For every categorical feature, only use this many most frequent categorical levels for model training. Only used for categorical_encoding == EnumLimited.

Default value: 10

Also available on the trained model.

maxNumRules

The maximum number of rules to return. defaults to -1 which means the number of rules is selectedby diminishing returns in model deviance.

Default value: -1

Also available on the trained model.

maxRuleLength

Maximum length of rules. Defaults to 3.

Default value: 3

Also available on the trained model.

minRuleLength

Minimum length of rules. Defaults to 3.

Default value: 3

Also available on the trained model.

modelId

Destination id for this model; auto-generated if not specified.

Scala default value: null ; Python default value: None

modelType

Specifies type of base learners in the ensemble. Possible values are "RULES", "RULES_AND_LINEAR", "LINEAR".

Default value: "RULES_AND_LINEAR"

Also available on the trained model.

predictionCol

Prediction column name

Default value: "prediction"

Also available on the trained model.

removeDuplicates

Whether to remove rules which are identical to an earlier rule. Defaults to true.

Scala default value: true ; Python default value: True

Also available on the trained model.

ruleGenerationNtrees

Specifies the number of trees to build in the tree model. Defaults to 50.

Default value: 50

Also available on the trained model.

seed

Seed for pseudo random number generator (if applicable).

Scala default value: -1L ; Python default value: -1

Also available on the trained model.

splitRatio

Accepts values in range [0, 1.0] which determine how large part of dataset is used for training and for validation. For example, 0.8 -> 80% training 20% validation. This parameter is ignored when validationDataFrame is set.

Default value: 1.0

validationDataFrame

A data frame dedicated for a validation of the trained model. If the parameters is not set,a validation frame created via the ‘splitRatio’ parameter. The parameter is not serializable!

Scala default value: null ; Python default value: None

weightCol

Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0.

Scala default value: null ; Python default value: None

Also available on the trained model.

withContributions

Enables or disables generating a sub-column of detailedPredictionCol containing Shapley values of original features.

Scala default value: false ; Python default value: False

Also available on the trained model.

withLeafNodeAssignments

Enables or disables computation of leaf node assignments.

Scala default value: false ; Python default value: False

Also available on the trained model.

withStageResults

Enables or disables computation of stage results.

Scala default value: false ; Python default value: False

Also available on the trained model.