Version: Next

Metrics: Adversarial similarity

Overview

H2O Model Validation offers an array of metrics to understand an adversarial similarity test. Below, each metric is described in turn.

Metrics

Similar-to-secondary probabilities

The similar-to-secondary probabilities histogram displays the row probabilities of the primary and secondary dataset rows.

X-axis: Probabilities of being similar to the secondary dataset
Y-axis: The relative frequency of the probability intervals between 0 and 1
- The relative frequency (Y-axis) refers to the percent-wise frequencies or number of occurrences of values in the datasets. H2O Model Validation performs this relative or percent-wise calculation to compare numbers between the primary and secondary datasets. Since H2O Model Validation expects each dataset to have a different size, making a fair comparison on direct frequencies is complex. Therefore, H2O Model Validation divides the number of occurrences by the length of the dataset.

note

In the adversarial similarity test, predictions are generated to determine how much a dataset’s rows differ from those belonging to the training dataset.
Higher values to the histogram's right indicate a greater difference between the primary dataset and the secondary dataset.
A balance of values on both sides of the histogram indicates a 1.0 AUC score (a perfect separation between the two datasets).

probability of belonging to Secondary Dataset

Dissimilarity score

The dissimilarity score card displays the general area under the curve (AUC) value, highlighting the dissimilarity between the primary and secondary datasets.

Feature importance

The feature importance chart displays the top features contributing to a higher area under the curve (AUC) value, highlighting the dissimilarity between the primary and secondary datasets. The chart displays the features from top to bottom.

X-axis: Gain value (the feature's importance to the test model)
Y-axis: Dataset variables (features)

note

Clicking on the bar of a feature triggers the display of the following plot: Feature particial dependency plot (PDP).

Feature particial dependency plot (PDP)

The feature particial dependency (PDP) plot displays the impact the different values of the selected feature have on the predicted values. The selected feature refers to the feature clicked on the Feature importance chart. The features partial dependency (PDP) plot only displays one single PDP line instead of one for each dataset.

X-axis: A variable's values
Y-axis: Probabilities of being similar to the secondary dataset
Average probability: Average probability for the whole concatenated dataset

note

The following histogram and graph are available for the observed feature in the feature partial dependence plot (PDP):

Feature histogram
Feature missing ratios

To access either the histogram or graph, consider the following instructions when viewing the Feature partial dependence plot (PDP):

Click
more_vert Kebab menu.
- To view the Feature histogram, select Histogram.
- To view the Feature missing ratios graph, select Missing ratios.

Feature histogram

The feature importance histogram displays the selected feature on the Feature importance chart.

X-axis: A variable's values
Y-axis: The relative frequency of the variable's values

Feature missing ratios

The feature feature missing ratios graph displays any missing ratio values of the selected feature in the primary and secondary dataset. The selected feature refers to the feature selected on the Feature importance chart.

X-axis: A variable's values
Y-axis: Missing ratio values

Shapley tab

Shapley table

The Shapley table contains adversarial model results from a Shapley perspective. Each row in the table represents a row of the secondary dataset (being compared to the primary dataset). Rows are ordered from the highest to lowest predictive value. In other words, rows higher in the table represent the rows of the primary dataset most dissimilar to the secondary dataset.

note

H2O Model Validation in the Shapley table displays the top highest (dissimilar) or lowest (similar) results (prediction values).
- To display the top highest (dissimilar) results, see Show dissimilar.
- To display the top lowest (similar) results, see Show similar.
- To download the Shapley results of the test dataset, see Download Shapley values.
The first two columns of the Shapley table refer to the row's ID, and the target column, while the rest of the columns refer to the train columns of the model.

Name Description
Row-Nr. Row ID
Adv-IsSecondary Target column
Clicking on one row ID displays a global or local Shapley chart that highlights the top most global and local features increasing the average model predictions that drive the most dissimilarity between the datasets. To learn more, see Global/local Shapley.

Name	Description
`Row-Nr.`	Row ID
`Adv-IsSecondary`	Target column

Show similar

By default, rows in the Shapley table are ordered from highest to lowest predictive values. To switch the default order and view rows from lowest to highest predictive values, consider the following instructions:

In the Shapley table, select Similar.

Show dissimilar

To revert to the default order of the rows in the Shapley table, where test rows are ordered from highest to lowest predictive values, consider the following instructions:

In the Shapley table, select Dissimilar.

Download Shapley values

To download the Shapley values of the secondary dataset, consider the following instructions when viewing the Shapley table:

Click Download Shapley values for the whole secondary dataset.

note

The CSV file contains adversarial model results from a Shapley perspective for the secondary dataset.

Global/local Shapley

The global or local Shapley chart displays the top global or local features increasing the average model predictions the most while driving the most dissimilarity between the datasets.

X-axis: Global/local Shapley values
Y-axis: Dataset variables

Feedback

Submit and view feedback for this page
Send feedback about H2O Model Validation to cloud-feedback@h2o.ai

Overview​

Metrics​

Similar-to-secondary probabilities​

Dissimilarity score​

Feature importance​

Feature particial dependency plot (PDP)​

Feature histogram​

Feature missing ratios​

Shapley tab​

Shapley table​

Show similar​

Show dissimilar​

Download Shapley values​

Global/local Shapley​

Overview

Metrics

Similar-to-secondary probabilities

Dissimilarity score

Feature importance

Feature particial dependency plot (PDP)

Feature histogram

Feature missing ratios

Shapley tab

Shapley table

Show similar

Show dissimilar

Download Shapley values

Global/local Shapley