Data manipulation

This section provides examples of common tasks performed when preparing data for machine learning. These examples are run on a local cluster.

Note

The examples in this section include datasets that are pulled from GitHub and S3.

Feature engineering

H2O-3 also has methods for feature engineering. Target Encoding is a categorical encoding technique which replaces a categorical value with the mean of the target variable (this is especially useful for high-cardinality features). Word2vec is a text processing method which converts a corpus of text into an output of word vectors.