Data Manipulation

This section provides examples of common tasks performed when preparing data for machine learning. These examples are run on a local cluster.

Note: The examples in this section include datasets that are pulled from GitHub and S3.

Feature Engineering

H2O also has methods for feature engineering. Target Encoding is a categorical encoding technique which replaces a categorical value with the mean of the target variable (especially useful for high-cardinality features). Word2vec is a text processing method which converts a corpus of text into an output of word vectors.