Parameters of H2OWord2Vec¶
Affected Class¶
- ai.h2o.sparkling.ml.features.H2OWord2Vec
Parameters¶
- Each parameter has also a corresponding getter and setter method. (E.g.: - label->- getLabel(),- setLabel(...))
- outputCol
- Output column name - Default value: - "H2OWord2Vec_b7102b0a3558__output"- Also available on the trained model. 
- columnsToCategorical
- List of columns to convert to categorical before modelling - Scala default value: - Array(); Python default value:- []
- convertInvalidNumbersToNa
- If set to ‘true’, the model converts invalid numbers to NA during making predictions. - Scala default value: - false; Python default value:- False- Also available on the trained model. 
- convertUnknownCategoricalLevelsToNa
- If set to ‘true’, the model converts unknown categorical levels to NA during making predictions. - Scala default value: - false; Python default value:- False- Also available on the trained model. 
- dataFrameSerializer
- A full name of a serializer used for serialization and deserialization of Spark DataFrames to a JSON value within NullableDataFrameParam. - Default value: - "ai.h2o.sparkling.utils.JSONDataFrameSerializer"- Also available on the trained model. 
- epochs
- Number of training iterations to run. - Default value: - 5- Also available on the trained model. 
- exportCheckpointsDir
- Automatically export generated models to this directory. - Scala default value: - null; Python default value:- None- Also available on the trained model. 
- initLearningRate
- Set the starting learning rate. - Scala default value: - 0.025f; Python default value:- 0.025- Also available on the trained model. 
- inputCol
- Input column name - Default value: - "No default value"- Also available on the trained model. 
- keepBinaryModels
- If set to true, all binary models created during execution of the - fitmethod will be kept in DKV of H2O-3 cluster.- Scala default value: - false; Python default value:- False
- maxRuntimeSecs
- Maximum allowed runtime in seconds for model training. Use 0 to disable. - Default value: - 0.0- Also available on the trained model. 
- minWordFreq
- This will discard words that appear less than <int> times. - Default value: - 5- Also available on the trained model. 
- modelId
- Destination id for this model; auto-generated if not specified. - Scala default value: - null; Python default value:- None
- normModel
- Use Hierarchical Softmax. Possible values are - "HSM".- Default value: - "HSM"- Also available on the trained model. 
- sentSampleRate
- Set threshold for occurrence of words. Those that appear with higher frequency in the training data
- will be randomly down-sampled; useful range is (0, 1e-5). 
 - Scala default value: - 0.001f; Python default value:- 0.001- Also available on the trained model. 
- splitRatio
- Accepts values in range [0, 1.0] which determine how large part of dataset is used for training and for validation. For example, 0.8 -> 80% training 20% validation. This parameter is ignored when validationDataFrame is set. - Default value: - 1.0
- validationDataFrame
- A data frame dedicated for a validation of the trained model. If the parameters is not set,a validation frame created via the ‘splitRatio’ parameter. The parameter is not serializable! - Scala default value: - null; Python default value:- None
- vecSize
- Set size of word vectors. - Default value: - 100- Also available on the trained model. 
- windowSize
- Set max skip length between words. - Default value: - 5- Also available on the trained model. 
- wordModel
- The word model to use (SkipGram or CBOW). Possible values are - "SkipGram",- "CBOW".- Default value: - "SkipGram"- Also available on the trained model.