Task 4: Experiment results summary

Overview

At the end of the experiment, a project summary will appear in the lower right corner. The experiment name is located in the upper left corner (as shown). experiment-results-summary

The summary includes the following:

Experiment: Name of the experiment.
- Version: Version of Driverless AI and the date it was launched.
- Settings: The selected training settings, seed, and amount of GPU’s enabled.
- Train data: Name of the training set, number of rows, and columns.
- Validation data: Name of the validation set, number of rows, and columns.
- Test data: Name of the test set, number of rows, and columns.
- Target column: Name of the target column (type of data and % target class).
System Specs: The machine specs including RAM, number of CPU cores, and number of GPU’s.
- Max memory usage: The maximum memory used during the experiment.
Recipe: The recipe used for the experiment.
- Validation scheme: Type of sampling and number of internal holdouts.
- Feature Engineering: Number of features scored and the final selection.
Timing: The time taken for each step in the experiment.
- Data preparation: Total time for data preparation and number of features scored.
- Shift/Leakage detection: Total time for shift and leakage detection.
- Model and feature tuning: Total time for model and feature tuning with the number of models trained.
- Feature evolution: Total time for feature evolution and number of models trained .
- Final pipeline training: Total time for final pipeline training and the total models trained.
- Python / MOJO scorer building: Total time for Python/MOJO scorer building.
Validation score: Logloss score +/- machine epsilon for the baseline.
Validation score: Logloss score +/- machine epsilon for the final pipeline.
Test score: Logloss score +/- machine epsilon score for the final pipeline.

The information in the experiment SUMMARY tab, along with additional detail, can be found in the experiment summary report. Click on the DOWNLOAD SUMMARY & LOGS button to download the report.

experiment-summary-report

Understanding model performance in the Freddie Mac classification experiment

This guide explores key questions to assess your model's performance in the Freddie Mac classification task.

Feature selection:

The total number of features that were scored for your model and the total features that were selected.

Validation vs. Test Score

Compare Validation and Test Scores: Analyze the difference between the final model's validation score and its test score.
- Validation Score: This score reflects the model's performance on a dedicated validation dataset not used for training.
- Test Score: This score represents the model's performance on a completely unseen test dataset.

Ideally, the validation and test scores should be close. A substantial difference suggests the model is overfitting to the training data and may not generalize well to unseen data.

Interpreting Log Loss

Test Log Loss: The test log loss metric serves as an indicator of your model's performance on unseen data, specifically the freddie_mac_500_test set. It essentially reflects the model's error rate in classifying the test instances.

Log loss values are inversely proportional to model performance. Lower log loss signifies a better fit, implying a lower number of misclassifications. Conversely, higher log loss indicates the model is making more mistakes in its predictions.

In this experiment, the relatively small log loss of 0.119511 +/- 0.001389542 suggests that the model achieved a low misclassification rate on the unseen test data. This is a positive indication of the model's ability to generalize well.

info

If you're unfamiliar with Log Loss, refer Log Loss for more information.

In the upcoming tasks we'll explore the financial implications of misclassification by exploring the confusion matrix and plots derived from it.

Feedback

Submit and view feedback for this page
Send feedback about H2O Driverless AI | Tutorials to cloud-feedback@h2o.ai

Task 4: Experiment results summary

Overview​

Understanding model performance in the Freddie Mac classification experiment​

Feature selection:​

Validation vs. Test Score​

Interpreting Log Loss​

Overview

Understanding model performance in the Freddie Mac classification experiment

Feature selection:

Validation vs. Test Score

Interpreting Log Loss