.. _parameters_H2OAutoML:

Parameters of H2OAutoML
-----------------------

Affected Classes
################

- ``ai.h2o.sparkling.ml.algos.H2OAutoML``
- ``ai.h2o.sparkling.ml.algos.classification.H2OAutoMLClassifier``
- ``ai.h2o.sparkling.ml.algos.regression.H2OAutoMLRegressor``

Parameters
##########

- *Each parameter has also a corresponding getter and setter method.*
  *(E.g.:* ``label`` *->* ``getLabel()`` *,* ``setLabel(...)`` *)*

blendingDataFrame
  This parameter is used for  computing the predictions that serve as the training frame for the meta-learner. If provided, this triggers blending mode on the stacked ensemble training stage. Blending mode is faster than cross-validating the base learners (though these ensembles may not perform as well as the Super Learner ensemble). The parameter is not serializable!

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

ignoredCols
  Names of columns to ignore for training.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

leaderboardDataFrame
  This parameter allows the user to specify a particular data frame to use to score and rank models on the leaderboard. This data frame will not be used for anything besides leaderboard scoring.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

monotoneConstraints
  A key must correspond to a feature name and value could be 1 or -1

  *Scala default value:* ``Map()`` *; Python default value:* ``{}``
  

balanceClasses
  Balance training data class counts via over/under-sampling (for imbalanced data).

  *Scala default value:* ``false`` *; Python default value:* ``False``
  

classSamplingFactors
  Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

columnsToCategorical
  List of columns to convert to categorical before modelling

  *Scala default value:* ``Array()`` *; Python default value:* ``[]``
  

convertInvalidNumbersToNa
  If set to 'true', the model converts invalid numbers to NA during making predictions.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  

convertUnknownCategoricalLevelsToNa
  If set to 'true', the model converts unknown categorical levels to NA during making predictions.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  

customDistributionFunc
  Reference to custom distribution, format: `language:keyName=funcName`.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

customMetricFunc
  Reference to custom evaluation function, format: `language:keyName=funcName`.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

dataFrameSerializer
  A full name of a serializer used for serialization and deserialization of Spark DataFrames to a JSON value within NullableDataFrameParam.

  *Default value:* ``"ai.h2o.sparkling.utils.JSONDataFrameSerializer"``
  

detailedPredictionCol
  Column containing additional prediction details, its content depends on the model type.

  *Default value:* ``"detailed_prediction"``
  

distribution
  Distribution function used by algorithms that support it; other algorithms use their defaults. Possible values are ``"AUTO"``, ``"bernoulli"``, ``"quasibinomial"``, ``"modified_huber"``, ``"multinomial"``, ``"ordinal"``, ``"gaussian"``, ``"poisson"``, ``"gamma"``, ``"tweedie"``, ``"huber"``, ``"laplace"``, ``"quantile"``, ``"fractionalbinomial"``, ``"negativebinomial"``, ``"custom"``.

  *Default value:* ``"AUTO"``
  

excludeAlgos
  A list of algorithms to skip during the model-building phase. Possible values are ``"GLM"``, ``"DRF"``, ``"GBM"``, ``"DeepLearning"``, ``"StackedEnsemble"``, ``"XGBoost"``.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

exploitationRatio
  The budget ratio (between 0 and 1) dedicated to the exploitation (vs exploration) phase.

  *Default value:* ``-1.0``
  

exportCheckpointsDir
  Path to a directory where every generated model will be stored.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

featuresCols
  Name of feature columns

  *Scala default value:* ``Array()`` *; Python default value:* ``[]``
  

foldCol
  Fold column (contains fold IDs) in the training frame. These assignments are used to create the folds for cross-validation of the models.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

huberAlpha
  Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1).

  *Default value:* ``0.9``
  

includeAlgos
  A list of algorithms to restrict to during the model-building phase. Possible values are ``"GLM"``, ``"DRF"``, ``"GBM"``, ``"DeepLearning"``, ``"StackedEnsemble"``, ``"XGBoost"``.

  *Scala default value:* ``Array("GLM", "DRF", "GBM", "DeepLearning", "StackedEnsemble", "XGBoost")`` *; Python default value:* ``["GLM", "DRF", "GBM", "DeepLearning", "StackedEnsemble", "XGBoost"]``
  

keepBinaryModels
  If set to true, all binary models created during execution of the ``fit`` method will be kept in DKV of H2O-3 cluster.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  

keepCrossValidationFoldAssignment
  Whether to keep cross-validation assignments.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  

keepCrossValidationModels
  Whether to keep the cross-validated models. Keeping cross-validation models may consume significantly more memory in the H2O cluster.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  

keepCrossValidationPredictions
  Whether to keep the predictions of the cross-validation predictions. This needs to be set to TRUE if running the same AutoML object for repeated runs because CV predictions are required to build additional Stacked Ensemble models in AutoML.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  

labelCol
  Response column.

  *Default value:* ``"label"``
  

maxAfterBalanceSize
  Maximum relative size of the training data after balancing class counts (defaults to 5.0 and can be less than 1.0). Requires balance_classes.

  *Scala default value:* ``5.0f`` *; Python default value:* ``5.0``
  

maxModels
  Maximum number of models to build (optional). Always set this parameter to ensure AutoML reproducibility: all models are then trained until convergence and none is constrained by a time budget.

  *Default value:* ``0``
  

maxRuntimeSecs
  This argument specifies the maximum time that the AutoML process will run for. If both max_runtime_secs and max_models are specified, then the AutoML run will stop as soon as it hits either of these limits. If neither max_runtime_secs nor max_models are specified, then max_runtime_secs defaults to 3600 seconds (1 hour).

  *Default value:* ``0.0``
  

maxRuntimeSecsPerModel
  Maximum time to spend on each individual model (optional). Note that models constrained by a time budget are not guaranteed reproducible.

  *Default value:* ``0.0``
  

nfolds
  Number of folds for k-fold cross-validation (defaults to -1 (AUTO), otherwise it must be >=2 or use 0 to disable). Disabling prevents Stacked Ensembles from being built.

  *Default value:* ``-1``
  

predictionCol
  Prediction column name

  *Default value:* ``"prediction"``
  

projectName
  Optional project name used to group models from multiple AutoML runs into a single Leaderboard; derived from the training data name if not specified.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

quantileAlpha
  Desired quantile for Quantile regression, must be between 0 and 1.

  *Default value:* ``0.5``
  

seed
  Seed for random number generator; set to a value other than -1 for reproducibility.

  *Scala default value:* ``-1L`` *; Python default value:* ``-1``
  

sortMetric
  Metric used to sort leaderboard. Possible values are ``"AUTO"``, ``"deviance"``, ``"logloss"``, ``"MSE"``, ``"RMSE"``, ``"MAE"``, ``"RMSLE"``, ``"AUC"``, ``"mean_per_class_error"``.

  *Default value:* ``"AUTO"``
  

splitRatio
  Accepts values in range [0, 1.0] which determine how large part of dataset is used for training and for validation. For example, 0.8 -> 80% training 20% validation. This parameter is ignored when validationDataFrame is set.

  *Default value:* ``1.0``
  

stoppingMetric
  Metric to use for early stopping (AUTO: logloss for classification, deviance for regression). Possible values are ``"AUTO"``, ``"deviance"``, ``"logloss"``, ``"MSE"``, ``"RMSE"``, ``"MAE"``, ``"RMSLE"``, ``"AUC"``, ``"AUCPR"``, ``"lift_top_group"``, ``"misclassification"``, ``"mean_per_class_error"``, ``"anomaly_score"``, ``"AUUC"``, ``"ATE"``, ``"ATT"``, ``"ATC"``, ``"qini"``, ``"custom"``, ``"custom_increasing"``.

  *Default value:* ``"AUTO"``
  

stoppingRounds
  Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable).

  *Default value:* ``3``
  

stoppingTolerance
  Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much).

  *Default value:* ``-1.0``
  

tweediePower
  Tweedie power for Tweedie regression, must be between 1 and 2.

  *Default value:* ``1.5``
  

validationDataFrame
  A data frame dedicated for a validation of the trained model. If the parameters is not set,a validation frame created via the 'splitRatio' parameter. The parameter is not serializable!

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

weightCol
  Weights column in the training frame, which specifies the row weights used in model training.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

withContributions
  Enables or disables generating a sub-column of detailedPredictionCol containing Shapley values of original features.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  

withLeafNodeAssignments
  Enables or disables computation of leaf node assignments.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  

withStageResults
  Enables or disables computation of stage results.

  *Scala default value:* ``false`` *; Python default value:* ``False``