Skip to main content
Version: v0.17.0

Compare datasets

Overview

You can compare datasets to understand the similarities and differences between them in H2O Model Validation. To compare datasets, you can select one column at a time and view an array of metrics about the column across selected datasets with similar columns. These metrics include column statistics such as mean, count, and missing ratio, as well as Pearson correlation values. By comparing these metrics, you can gain insights into the distribution and relationships between variables across different datasets.

Compare datasets

note

H2O Model Validation utilizes the dataset's summaries to compare the datasets with similar columns.

Instructions

To compare datasets, consider the following instructions:

note

Non-numerical columns (for example, time columns) are not available for comparison.

  1. In the H2O Model Validation navigation menu, click Datasets.
  2. Click the Select datasets toggle.
  3. In the Datasets table, select at least two datasets to compare.
  4. Click Compare.
  5. In the Pick a column box, select or search the column(s) you want to compare.
note

In the other tabs, you can observe the comparison metrics of certain tests associated with the compared datasets. To learn more, see Comparison metrics: Validation tests.


Feedback