.. _parameters_H2OKMeans:

Parameters of H2OKMeans
-----------------------

Affected Class
##############

- ``ai.h2o.sparkling.ml.algos.H2OKMeans``

Parameters
##########

- *Each parameter has also a corresponding getter and setter method.*
  *(E.g.:* ``label`` *->* ``getLabel()`` *,* ``setLabel(...)`` *)*

ignoredCols
  Names of columns to ignore for training.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

userPoints
  This option allows you to specify array of points, where each point represents coordinates of an initial cluster center. The user-specified points must have the same number of columns as the training observations. The number of rows must equal the number of clusters.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

categoricalEncoding
  Encoding scheme for categorical features. Possible values are ``"AUTO"``, ``"OneHotInternal"``, ``"OneHotExplicit"``, ``"Enum"``, ``"Binary"``, ``"Eigen"``, ``"LabelEncoder"``, ``"SortByResponse"``, ``"EnumLimited"``.

  *Default value:* ``"AUTO"``
  
  *Also available on the trained model.*

clusterSizeConstraints
  An array specifying the minimum number of points that should be in each cluster. The length of the constraints array has to be the same as the number of clusters.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

columnsToCategorical
  List of columns to convert to categorical before modelling

  *Scala default value:* ``Array()`` *; Python default value:* ``[]``
  

convertInvalidNumbersToNa
  If set to 'true', the model converts invalid numbers to NA during making predictions.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

convertUnknownCategoricalLevelsToNa
  If set to 'true', the model converts unknown categorical levels to NA during making predictions.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

dataFrameSerializer
  A full name of a serializer used for serialization and deserialization of Spark DataFrames to a JSON value within NullableDataFrameParam.

  *Default value:* ``"ai.h2o.sparkling.utils.JSONDataFrameSerializer"``
  
  *Also available on the trained model.*

detailedPredictionCol
  Column containing additional prediction details, its content depends on the model type.

  *Default value:* ``"detailed_prediction"``
  
  *Also available on the trained model.*

estimateK
  Whether to estimate the number of clusters (<=k) iteratively and deterministically.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

exportCheckpointsDir
  Automatically export generated models to this directory.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

featuresCols
  Name of feature columns

  *Scala default value:* ``Array()`` *; Python default value:* ``[]``
  
  *Also available on the trained model.*

foldAssignment
  Cross-validation fold assignment scheme, if fold_column is not specified. The 'Stratified' option will stratify the folds based on the response variable, for classification problems. Possible values are ``"AUTO"``, ``"Random"``, ``"Modulo"``, ``"Stratified"``.

  *Default value:* ``"AUTO"``
  
  *Also available on the trained model.*

foldCol
  Column with cross-validation fold index assignment per observation.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

ignoreConstCols
  Ignore constant columns.

  *Scala default value:* ``true`` *; Python default value:* ``True``
  
  *Also available on the trained model.*

init
  Initialization mode. Possible values are ``"Random"``, ``"PlusPlus"``, ``"Furthest"``, ``"User"``.

  *Default value:* ``"Furthest"``
  
  *Also available on the trained model.*

k
  The max. number of clusters. If estimate_k is disabled, the model will find k centroids, otherwise it will find up to k centroids.

  *Default value:* ``1``
  
  *Also available on the trained model.*

keepBinaryModels
  If set to true, all binary models created during execution of the ``fit`` method will be kept in DKV of H2O-3 cluster.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  

keepCrossValidationFoldAssignment
  Whether to keep the cross-validation fold assignment.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

keepCrossValidationModels
  Whether to keep the cross-validation models.

  *Scala default value:* ``true`` *; Python default value:* ``True``
  
  *Also available on the trained model.*

keepCrossValidationPredictions
  Whether to keep the predictions of the cross-validation models.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

maxIterations
  Maximum training iterations (if estimate_k is enabled, then this is for each inner Lloyds iteration).

  *Default value:* ``10``
  
  *Also available on the trained model.*

maxRuntimeSecs
  Maximum allowed runtime in seconds for model training. Use 0 to disable.

  *Default value:* ``0.0``
  
  *Also available on the trained model.*

modelId
  Destination id for this model; auto-generated if not specified.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

nfolds
  Number of folds for K-fold cross-validation (0 to disable or >= 2).

  *Default value:* ``0``
  
  *Also available on the trained model.*

predictionCol
  Prediction column name

  *Default value:* ``"prediction"``
  
  *Also available on the trained model.*

scoreEachIteration
  Whether to score during each iteration of model training.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

seed
  RNG Seed.

  *Scala default value:* ``-1L`` *; Python default value:* ``-1``
  
  *Also available on the trained model.*

splitRatio
  Accepts values in range [0, 1.0] which determine how large part of dataset is used for training and for validation. For example, 0.8 -> 80% training 20% validation. This parameter is ignored when validationDataFrame is set.

  *Default value:* ``1.0``
  

standardize
  Standardize columns before computing distances.

  *Scala default value:* ``true`` *; Python default value:* ``True``
  
  *Also available on the trained model.*

validationDataFrame
  A data frame dedicated for a validation of the trained model. If the parameters is not set,a validation frame created via the 'splitRatio' parameter. The parameter is not serializable!

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

withContributions
  Enables or disables generating a sub-column of detailedPredictionCol containing Shapley values of original features.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

withLeafNodeAssignments
  Enables or disables computation of leaf node assignments.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

withStageResults
  Enables or disables computation of stage results.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*