Available in: GBM, DRF, Deep Learning, GLM, Naïve-Bayes, AutoML
During model training, you might find that the majority of your data belongs in a single class. For example, consider a binary classification model that has 100 rows, with 80 rows labeled as class 1 and the remaining 20 rows labeled as class 2. This is a common scenario, given that machine learning attempts to predict class 1 with the highest accuracy. It can also be an example of an imbalanced dataset, in this case, with a ratio of 4:1.
balance_classes option can be used to balance the class distribution. When enabled, H2O will either undersample the majority classes or oversample the minority classes. Note that the resulting model will also correct the final probabilities (“undo the sampling”) using a monotonic transform, so the predicted probabilities of the first model will differ from a second model. However, because AUC only cares about ordering, it won’t be affected.
If this option is enabled, then you can also specify a value for the
This option is disabled by default.
This option only applies to classification problems.
Enabling this option can increase the size of the data frame.
Refer to the following link for more information about balance classes: https://gking.harvard.edu/files/0s.pdf.