.. _parameters_H2OGAM: Parameters of H2OGAM -------------------- Affected Classes ################ - ``ai.h2o.sparkling.ml.algos.H2OGAM`` - ``ai.h2o.sparkling.ml.algos.classification.H2OGAMClassifier`` - ``ai.h2o.sparkling.ml.algos.regression.H2OGAMRegressor`` Parameters ########## - *Each parameter has also a corresponding getter and setter method.* *(E.g.:* ``label`` *->* ``getLabel()`` *,* ``setLabel(...)`` *)* betaConstraints Data frame of beta constraints enabling to set special conditions over the model coefficients. *Scala default value:* ``null`` *; Python default value:* ``None`` ignoredCols Names of columns to ignore for training. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* alphaValue Distribution of regularization between the L1 (Lasso) and L2 (Ridge) penalties. A value of 1 for alpha represents Lasso regression, a value of 0 produces Ridge regression, and anything in between specifies the amount of mixing between the two. Default value of alpha is 0 when SOLVER = 'L-BFGS'; 0.5 otherwise. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* aucType Set default multinomial AUC type. Possible values are ``"AUTO"``, ``"NONE"``, ``"MACRO_OVR"``, ``"WEIGHTED_OVR"``, ``"MACRO_OVO"``, ``"WEIGHTED_OVO"``. *Default value:* ``"AUTO"`` *Also available on the trained model.* balanceClasses Balance training data class counts via over/under-sampling (for imbalanced data). *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* betaEpsilon Converge if beta changes less (using L-infinity norm) than beta esilon, ONLY applies to IRLSM solver . *Scala default value:* ``1.0e-4`` *; Python default value:* ``1.0E-4`` *Also available on the trained model.* bs Basis function type for each gam predictors, 0 for cr, 1 for thin plate regression with knots, 2 for monotone I-splines, 3 for NBSplineTypeI M-splines (refer to doc here: https://github.com/h2oai/h2o-3/issues/6926). If specified, must be the same size as gam_columns. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* classSamplingFactors Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* coldStart Only applicable to multiple alpha/lambda values when calling GLM from GAM. If false, build the next model for next set of alpha/lambda values starting from the values provided by current model. If true will start GLM model from scratch. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* columnsToCategorical List of columns to convert to categorical before modelling *Scala default value:* ``Array()`` *; Python default value:* ``[]`` computePValues Request p-values computation, p-values work only with IRLSM solver and no regularization. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* convertInvalidNumbersToNa If set to 'true', the model converts invalid numbers to NA during making predictions. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* convertUnknownCategoricalLevelsToNa If set to 'true', the model converts unknown categorical levels to NA during making predictions. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* dataFrameSerializer A full name of a serializer used for serialization and deserialization of Spark DataFrames to a JSON value within NullableDataFrameParam. *Default value:* ``"ai.h2o.sparkling.utils.JSONDataFrameSerializer"`` *Also available on the trained model.* detailedPredictionCol Column containing additional prediction details, its content depends on the model type. *Default value:* ``"detailed_prediction"`` *Also available on the trained model.* earlyStopping Stop early when there is no more relative improvement on train or validation (if provided). *Scala default value:* ``true`` *; Python default value:* ``True`` *Also available on the trained model.* exportCheckpointsDir Automatically export generated models to this directory. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* family Family. Use binomial for classification with logistic regression, others are for regression problems. Possible values are ``"AUTO"``, ``"gaussian"``, ``"binomial"``, ``"fractionalbinomial"``, ``"quasibinomial"``, ``"poisson"``, ``"gamma"``, ``"multinomial"``, ``"tweedie"``, ``"ordinal"``, ``"negativebinomial"``. *Default value:* ``"AUTO"`` *Also available on the trained model.* featuresCols Name of feature columns *Scala default value:* ``Array()`` *; Python default value:* ``[]`` *Also available on the trained model.* foldAssignment Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems. Possible values are ``"AUTO"``, ``"Random"``, ``"Modulo"``, ``"Stratified"``. *Default value:* ``"AUTO"`` *Also available on the trained model.* foldCol Column with cross-validation fold index assignment per observation. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* gainsliftBins Gains/Lift table number of bins. 0 means disabled.. Default value -1 means automatic binning. *Default value:* ``-1`` *Also available on the trained model.* gamCols Arrays of predictor column names for gam for smoothers using single or multiple predictors like {{'c1'},{'c2','c3'},{'c4'},...} *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* gradientEpsilon Converge if objective changes less (using L-infinity norm) than this, ONLY applies to L-BFGS solver. Default indicates: If lambda_search is set to False and lambda is equal to zero, the default value of gradient_epsilon is equal to .000001, otherwise the default value is .0001. If lambda_search is set to True, the conditional values above are 1E-8 and 1E-6 respectively. *Default value:* ``-1.0`` *Also available on the trained model.* ignoreConstCols Ignore constant columns. *Scala default value:* ``true`` *; Python default value:* ``True`` *Also available on the trained model.* interactions A list of predictor column indices to interact. All pairwise combinations will be computed for the list. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* intercept Include constant term in the model. *Scala default value:* ``true`` *; Python default value:* ``True`` *Also available on the trained model.* keepBinaryModels If set to true, all binary models created during execution of the ``fit`` method will be kept in DKV of H2O-3 cluster. *Scala default value:* ``false`` *; Python default value:* ``False`` keepCrossValidationFoldAssignment Whether to keep the cross-validation fold assignment. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* keepCrossValidationModels Whether to keep the cross-validation models. *Scala default value:* ``true`` *; Python default value:* ``True`` *Also available on the trained model.* keepCrossValidationPredictions Whether to keep the predictions of the cross-validation models. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* keepGamCols Save keys of model matrix. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* knotIds Array storing frame keys of knots. One for each gam column set specified in gam_columns. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* labelCol Response variable column. *Default value:* ``"label"`` *Also available on the trained model.* lambdaSearch Use lambda search starting at lambda max, given lambda is then interpreted as lambda min. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* lambdaValue Regularization strength. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* link Link function. Possible values are ``"family_default"``, ``"identity"``, ``"logit"``, ``"log"``, ``"inverse"``, ``"tweedie"``, ``"multinomial"``, ``"ologit"``, ``"oprobit"``, ``"ologlog"``. *Default value:* ``"family_default"`` *Also available on the trained model.* maxActivePredictors Maximum number of active predictors during computation. Use as a stopping criterion to prevent expensive model building with many predictors. Default indicates: If the IRLSM solver is used, the value of max_active_predictors is set to 5000 otherwise it is set to 100000000. *Default value:* ``-1`` *Also available on the trained model.* maxAfterBalanceSize Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes. *Scala default value:* ``5.0f`` *; Python default value:* ``5.0`` *Also available on the trained model.* maxConfusionMatrixSize [Deprecated] Maximum size (# classes) for confusion matrices to be printed in the Logs. *Default value:* ``20`` *Also available on the trained model.* maxIterations Maximum number of iterations. *Default value:* ``-1`` *Also available on the trained model.* maxRuntimeSecs Maximum allowed runtime in seconds for model training. Use 0 to disable. *Default value:* ``0.0`` *Also available on the trained model.* missingValuesHandling Handling of missing values. Either MeanImputation, Skip or PlugValues. Possible values are ``"MeanImputation"``, ``"PlugValues"``, ``"Skip"``. *Default value:* ``"MeanImputation"`` *Also available on the trained model.* modelId Destination id for this model; auto-generated if not specified. *Scala default value:* ``null`` *; Python default value:* ``None`` nfolds Number of folds for K-fold cross-validation (0 to disable or >= 2). *Default value:* ``0`` *Also available on the trained model.* nlambdas Number of lambdas to be used in a search. Default indicates: If alpha is zero, with lambda search set to True, the value of nlamdas is set to 30 (fewer lambdas are needed for ridge regression) otherwise it is set to 100. *Default value:* ``-1`` *Also available on the trained model.* nonNegative Restrict coefficients (not intercept) to be non-negative. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* numKnots Number of knots for gam predictors. If specified, must specify one for each gam predictor. For monotone I-splines, mininum = 2, for cs spline, minimum = 3. For thin plate, minimum is size of polynomial basis + 2. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* objReg Likelihood divider in objective value computation, default is 1/nobs. *Default value:* ``-1.0`` *Also available on the trained model.* objectiveEpsilon Converge if objective value changes less than this. Default indicates: If lambda_search is set to True the value of objective_epsilon is set to .0001. If the lambda_search is set to False and lambda is equal to zero, the value of objective_epsilon is set to .000001, for any other value of lambda the default value of objective_epsilon is set to .0001. *Default value:* ``-1.0`` *Also available on the trained model.* offsetCol Offset column. This will be added to the combination of columns before applying the link function. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* predictionCol Prediction column name *Default value:* ``"prediction"`` *Also available on the trained model.* prior Prior probability for y==1. To be used only for logistic regression iff the data has been sampled and the mean of response does not reflect reality. *Default value:* ``-1.0`` *Also available on the trained model.* removeCollinearCols In case of linearly dependent columns, remove some of the dependent columns. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* scale Smoothing parameter for gam predictors. If specified, must be of the same length as gam_columns. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* scaleTpPenaltyMat Scale penalty matrix for tp (thin plate) smoothers as in R. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* scoreEachIteration Whether to score during each iteration of model training. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* seed Seed for pseudo random number generator (if applicable). *Scala default value:* ``-1L`` *; Python default value:* ``-1`` solver AUTO will set the solver based on given data and the other parameters. IRLSM is fast on on problems with small number of predictors and for lambda-search with L1 penalty, L_BFGS scales better for datasets with many columns. Possible values are ``"AUTO"``, ``"IRLSM"``, ``"L_BFGS"``, ``"COORDINATE_DESCENT_NAIVE"``, ``"COORDINATE_DESCENT"``, ``"GRADIENT_DESCENT_LH"``, ``"GRADIENT_DESCENT_SQERR"``. *Default value:* ``"AUTO"`` *Also available on the trained model.* splineOrders Order of I-splines or NBSplineTypeI M-splines used for gam predictors. If specified, must be the same size as gam_columns. For I-splines, the spline_orders will be the same as the polynomials used to generate the splines. For M-splines, the polynomials used to generate the splines will be spline_order-1. Values for bs=0 or 1 will be ignored. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* splinesNonNegative Valid for I-spline (bs=2) only. True if the I-splines are monotonically increasing (and monotonically non-decreasing) and False if the I-splines are monotonically decreasing (and monotonically non-increasing). If specified, must be the same size as gam_columns. Values for other spline types will be ignored. Default to true. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* splitRatio Accepts values in range [0, 1.0] which determine how large part of dataset is used for training and for validation. For example, 0.8 -> 80% training 20% validation. This parameter is ignored when validationDataFrame is set. *Default value:* ``1.0`` standardize Standardize numeric columns to have zero mean and unit variance. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* standardizeTpGamCols standardize tp (thin plate) predictor columns. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* startval double array to initialize coefficients for GAM. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* stoppingMetric Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. Possible values are ``"AUTO"``, ``"deviance"``, ``"logloss"``, ``"MSE"``, ``"RMSE"``, ``"MAE"``, ``"RMSLE"``, ``"AUC"``, ``"AUCPR"``, ``"lift_top_group"``, ``"misclassification"``, ``"mean_per_class_error"``, ``"anomaly_score"``, ``"AUUC"``, ``"ATE"``, ``"ATT"``, ``"ATC"``, ``"qini"``, ``"custom"``, ``"custom_increasing"``. *Default value:* ``"AUTO"`` *Also available on the trained model.* stoppingRounds Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable). *Default value:* ``0`` *Also available on the trained model.* stoppingTolerance Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much). *Default value:* ``0.001`` *Also available on the trained model.* storeKnotLocations If set to true, will return knot locations as double[][] array for gam column names found knots_for_gam. Default to false. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* theta Theta. *Default value:* ``0.0`` *Also available on the trained model.* tweedieLinkPower Tweedie link power. *Default value:* ``0.0`` *Also available on the trained model.* tweedieVariancePower Tweedie variance power. *Default value:* ``0.0`` *Also available on the trained model.* validationDataFrame A data frame dedicated for a validation of the trained model. If the parameters is not set,a validation frame created via the 'splitRatio' parameter. The parameter is not serializable! *Scala default value:* ``null`` *; Python default value:* ``None`` weightCol Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* withContributions Enables or disables generating a sub-column of detailedPredictionCol containing Shapley values of original features. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* withLeafNodeAssignments Enables or disables computation of leaf node assignments. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* withStageResults Enables or disables computation of stage results. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.*