Skip to main content
Version: v0.17.0

Compare datasets


You can compare datasets to understand the similarities and differences between them in H2O Model Validation. To compare datasets, you can select one column at a time and view an array of metrics about the column across selected datasets with similar columns. These metrics include column statistics such as mean, count, and missing ratio, as well as Pearson correlation values. By comparing these metrics, you can gain insights into the distribution and relationships between variables across different datasets.

Compare datasets


H2O Model Validation utilizes the dataset's summaries to compare the datasets with similar columns.


To compare datasets, consider the following instructions:


Non-numerical columns (for example, time columns) are not available for comparison.

  1. In the H2O Model Validation navigation menu, click Datasets.
  2. Click the Select datasets toggle.
  3. In the Datasets table, select at least two datasets to compare.
  4. Click Compare.
  5. In the Pick a column box, select or search the column(s) you want to compare.

In the other tabs, you can observe the comparison metrics of certain tests associated with the compared datasets. To learn more, see Comparison metrics: Validation tests.