.. _parameters_H2OGBM: Parameters of H2OGBM -------------------- Affected Classes ################ - ``ai.h2o.sparkling.ml.algos.H2OGBM`` - ``ai.h2o.sparkling.ml.algos.classification.H2OGBMClassifier`` - ``ai.h2o.sparkling.ml.algos.regression.H2OGBMRegressor`` Parameters ########## - *Each parameter has also a corresponding getter and setter method.* *(E.g.:* ``label`` *->* ``getLabel()`` *,* ``setLabel(...)`` *)* calibrationDataFrame Calibration frame for Platt Scaling. To enable usage of the data frame, set the parameter calibrateModel to True. *Scala default value:* ``null`` *; Python default value:* ``None`` ignoredCols Names of columns to ignore for training. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* monotoneConstraints A key must correspond to a feature name and value could be 1 or -1 *Scala default value:* ``Map()`` *; Python default value:* ``{}`` *Also available on the trained model.* aucType Set default multinomial AUC type. Possible values are ``"AUTO"``, ``"NONE"``, ``"MACRO_OVR"``, ``"WEIGHTED_OVR"``, ``"MACRO_OVO"``, ``"WEIGHTED_OVO"``. *Default value:* ``"AUTO"`` *Also available on the trained model.* autoRebalance Allow automatic rebalancing of training and validation datasets. *Scala default value:* ``true`` *; Python default value:* ``True`` *Also available on the trained model.* balanceClasses Balance training data class counts via over/under-sampling (for imbalanced data). *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* buildTreeOneNode Run on one node only; no network overhead but fewer cpus used. Suitable for small datasets. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* calibrateModel Use Platt Scaling (default) or Isotonic Regression to calculate calibrated class probabilities. Calibration can provide more accurate estimates of class probabilities. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* calibrationMethod Calibration method to use. Possible values are ``"AUTO"``, ``"PlattScaling"``, ``"IsotonicRegression"``. *Default value:* ``"AUTO"`` *Also available on the trained model.* categoricalEncoding Encoding scheme for categorical features. Possible values are ``"AUTO"``, ``"OneHotInternal"``, ``"OneHotExplicit"``, ``"Enum"``, ``"Binary"``, ``"Eigen"``, ``"LabelEncoder"``, ``"SortByResponse"``, ``"EnumLimited"``. *Default value:* ``"AUTO"`` *Also available on the trained model.* checkConstantResponse Check if response column is constant. If enabled, then an exception is thrown if the response column is a constant value.If disabled, then model will train regardless of the response column being a constant value or not. *Scala default value:* ``true`` *; Python default value:* ``True`` *Also available on the trained model.* classSamplingFactors Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* colSampleRate Column sample rate (from 0.0 to 1.0). *Default value:* ``1.0`` *Also available on the trained model.* colSampleRateChangePerLevel Relative change of the column sampling rate for every level (must be > 0.0 and <= 2.0). *Default value:* ``1.0`` *Also available on the trained model.* colSampleRatePerTree Column sample rate per tree (from 0.0 to 1.0). *Default value:* ``1.0`` *Also available on the trained model.* columnsToCategorical List of columns to convert to categorical before modelling *Scala default value:* ``Array()`` *; Python default value:* ``[]`` convertInvalidNumbersToNa If set to 'true', the model converts invalid numbers to NA during making predictions. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* convertUnknownCategoricalLevelsToNa If set to 'true', the model converts unknown categorical levels to NA during making predictions. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* customDistributionFunc Reference to custom distribution, format: `language:keyName=funcName`. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* customMetricFunc Reference to custom evaluation function, format: `language:keyName=funcName`. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* dataFrameSerializer A full name of a serializer used for serialization and deserialization of Spark DataFrames to a JSON value within NullableDataFrameParam. *Default value:* ``"ai.h2o.sparkling.utils.JSONDataFrameSerializer"`` *Also available on the trained model.* detailedPredictionCol Column containing additional prediction details, its content depends on the model type. *Default value:* ``"detailed_prediction"`` *Also available on the trained model.* distribution Distribution function. Possible values are ``"AUTO"``, ``"bernoulli"``, ``"quasibinomial"``, ``"modified_huber"``, ``"multinomial"``, ``"ordinal"``, ``"gaussian"``, ``"poisson"``, ``"gamma"``, ``"tweedie"``, ``"huber"``, ``"laplace"``, ``"quantile"``, ``"fractionalbinomial"``, ``"negativebinomial"``, ``"custom"``. *Default value:* ``"AUTO"`` *Also available on the trained model.* exportCheckpointsDir Automatically export generated models to this directory. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* featuresCols Name of feature columns *Scala default value:* ``Array()`` *; Python default value:* ``[]`` *Also available on the trained model.* foldAssignment Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems. Possible values are ``"AUTO"``, ``"Random"``, ``"Modulo"``, ``"Stratified"``. *Default value:* ``"AUTO"`` *Also available on the trained model.* foldCol Column with cross-validation fold index assignment per observation. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* gainsliftBins Gains/Lift table number of bins. 0 means disabled.. Default value -1 means automatic binning. *Default value:* ``-1`` *Also available on the trained model.* histogramType What type of histogram to use for finding optimal split points. Possible values are ``"AUTO"``, ``"UniformAdaptive"``, ``"Random"``, ``"QuantilesGlobal"``, ``"RoundRobin"``, ``"UniformRobust"``. *Default value:* ``"AUTO"`` *Also available on the trained model.* huberAlpha Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1). *Default value:* ``0.9`` *Also available on the trained model.* ignoreConstCols Ignore constant columns. *Scala default value:* ``true`` *; Python default value:* ``True`` *Also available on the trained model.* inTrainingCheckpointsDir Create checkpoints into defined directory while training process is still running. In case of cluster shutdown, this checkpoint can be used to restart training. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* inTrainingCheckpointsTreeInterval Checkpoint the model after every so many trees. Parameter is used only when in_training_checkpoints_dir is defined. *Default value:* ``1`` *Also available on the trained model.* interactionConstraints A set of allowed column interactions. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* keepBinaryModels If set to true, all binary models created during execution of the ``fit`` method will be kept in DKV of H2O-3 cluster. *Scala default value:* ``false`` *; Python default value:* ``False`` keepCrossValidationFoldAssignment Whether to keep the cross-validation fold assignment. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* keepCrossValidationModels Whether to keep the cross-validation models. *Scala default value:* ``true`` *; Python default value:* ``True`` *Also available on the trained model.* keepCrossValidationPredictions Whether to keep the predictions of the cross-validation models. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* labelCol Response variable column. *Default value:* ``"label"`` *Also available on the trained model.* learnRate Learning rate (from 0.0 to 1.0). *Default value:* ``0.1`` *Also available on the trained model.* learnRateAnnealing Scale the learning rate by this factor after each tree (e.g., 0.99 or 0.999) . *Default value:* ``1.0`` *Also available on the trained model.* maxAbsLeafnodePred Maximum absolute value of a leaf node prediction. *Scala default value:* ``1.7976931348623157e308`` *; Python default value:* ``1.7976931348623157E308`` *Also available on the trained model.* maxAfterBalanceSize Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes. *Scala default value:* ``5.0f`` *; Python default value:* ``5.0`` *Also available on the trained model.* maxConfusionMatrixSize [Deprecated] Maximum size (# classes) for confusion matrices to be printed in the Logs. *Default value:* ``20`` *Also available on the trained model.* maxDepth Maximum tree depth (0 for unlimited). *Default value:* ``5`` *Also available on the trained model.* maxRuntimeSecs Maximum allowed runtime in seconds for model training. Use 0 to disable. *Default value:* ``0.0`` *Also available on the trained model.* minRows Fewest allowed (weighted) observations in a leaf. *Default value:* ``10.0`` *Also available on the trained model.* minSplitImprovement Minimum relative improvement in squared error reduction for a split to happen. *Scala default value:* ``1.0e-5`` *; Python default value:* ``1.0E-5`` *Also available on the trained model.* modelId Destination id for this model; auto-generated if not specified. *Scala default value:* ``null`` *; Python default value:* ``None`` nbins For numerical columns (real/int), build a histogram of (at least) this many bins, then split at the best point. *Default value:* ``20`` *Also available on the trained model.* nbinsCats For categorical columns (factors), build a histogram of this many bins, then split at the best point. Higher values can lead to more overfitting. *Default value:* ``1024`` *Also available on the trained model.* nbinsTopLevel For numerical columns (real/int), build a histogram of (at most) this many bins at the root level, then decrease by factor of two per level. *Default value:* ``1024`` *Also available on the trained model.* nfolds Number of folds for K-fold cross-validation (0 to disable or >= 2). *Default value:* ``0`` *Also available on the trained model.* ntrees Number of trees. *Default value:* ``50`` *Also available on the trained model.* offsetCol Offset column. This will be added to the combination of columns before applying the link function. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* predNoiseBandwidth Bandwidth (sigma) of Gaussian multiplicative noise ~N(1,sigma) for tree node predictions. *Default value:* ``0.0`` *Also available on the trained model.* predictionCol Prediction column name *Default value:* ``"prediction"`` *Also available on the trained model.* quantileAlpha Desired quantile for Quantile regression, must be between 0 and 1. *Default value:* ``0.5`` *Also available on the trained model.* sampleRate Row sample rate per tree (from 0.0 to 1.0). *Default value:* ``1.0`` *Also available on the trained model.* sampleRatePerClass A list of row sample rates per class (relative fraction for each class, from 0.0 to 1.0), for each tree. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* scoreEachIteration Whether to score during each iteration of model training. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* scoreTreeInterval Score the model after every so many trees. Disabled if set to 0. *Default value:* ``0`` *Also available on the trained model.* seed Seed for pseudo random number generator (if applicable). *Scala default value:* ``-1L`` *; Python default value:* ``-1`` *Also available on the trained model.* splitRatio Accepts values in range [0, 1.0] which determine how large part of dataset is used for training and for validation. For example, 0.8 -> 80% training 20% validation. This parameter is ignored when validationDataFrame is set. *Default value:* ``1.0`` stoppingMetric Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. Possible values are ``"AUTO"``, ``"deviance"``, ``"logloss"``, ``"MSE"``, ``"RMSE"``, ``"MAE"``, ``"RMSLE"``, ``"AUC"``, ``"AUCPR"``, ``"lift_top_group"``, ``"misclassification"``, ``"mean_per_class_error"``, ``"anomaly_score"``, ``"AUUC"``, ``"ATE"``, ``"ATT"``, ``"ATC"``, ``"qini"``, ``"custom"``, ``"custom_increasing"``. *Default value:* ``"AUTO"`` *Also available on the trained model.* stoppingRounds Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable). *Default value:* ``0`` *Also available on the trained model.* stoppingTolerance Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much). *Default value:* ``0.001`` *Also available on the trained model.* tweediePower Tweedie power for Tweedie regression, must be between 1 and 2. *Default value:* ``1.5`` *Also available on the trained model.* validationDataFrame A data frame dedicated for a validation of the trained model. If the parameters is not set,a validation frame created via the 'splitRatio' parameter. The parameter is not serializable! *Scala default value:* ``null`` *; Python default value:* ``None`` weightCol Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* withContributions Enables or disables generating a sub-column of detailedPredictionCol containing Shapley values of original features. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* withLeafNodeAssignments Enables or disables computation of leaf node assignments. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* withStageResults Enables or disables computation of stage results. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.*