Skip to main content
Version: v0.14.0

View a dataset's summary

A dataset summary lets you quickly understand an array of insights about a particular dataset in your established Driverless AI (DAI) connection (e.g., count, mean, STD, min, max, missing, etc.).

Instructions

To access a dataset summary, consider the following instructions:

  1. In the H2O Model Validation navigation menu, click Datasets.
    note
    • On the Datasets card, you can search for your dataset summaries on the datasets table.
    • On the Datasets card, in particular, in the datasets table, you can view a dataset summary for all your datasets. To learn more, see Datasets table columns
  2. In the datasets table, select the dataset summary you want to view.
  3. Click View.
    note

    A dataset summary table will appear, highlighting several summary metrics about the dataset (e.g., frequency). To learn more, see Dataset summary table.

Datasets table columns

Column nameDescription
NameDataset name.
Data SummaryState of the data summary.
RowsRow numbers.
ColumnsColumn numbers.
File SizeFile size.
Adversarial SimilarityThe number of Adversarial Similarity tests complete and scheduled to run.
Drift DetectionThe number of Drift Detection tests complete and scheduled to run.

Dataset summary states

As follows are the different types of states a dataset summary can be in:

  • NotCreated
    • H2O Model Validation has not created the dataset summary.
  • Created
    • H2O Model Validation created the dataset summary.
  • Running
    • H2O Model Validation is currently running the dataset summary.
  • Done
    • H2O Model Validation has completed the dataset summary.
  • Deleted
    • H2O Model Validation deleted the dataset summary.
  • Error
    • An error occurred during the dataset summary.
  • Timeout
    • There was not enough time to complete the dataset summary.

Dataset summary table

Column nameDescription
FeatureFeature name (one of the column names in the dataset).
CountCount (number) of value features present in the feature column.
MeanThe typical feature value.
StdFeature values standard deviation (a measure of divergence or distribution of the feature values).
MinThe minimum feature value.
MaxThe maximum feature value.
MissingMissing feature values.
UniqueUnique feature values.
FreqThe feature frequency value.
note

H2O Model Validation will mark feature columns with N/A (not applicable) if the column feature value is non-numeric.


Feedback