Experiment Summary¶

An experiment summary is available for each completed experiment. Click the Download Summary & Logs button to download the h2oai_experiment_summary_<experiment>.zip file.

The files within the experiment summary zip provide textual explanations of the graphical representations that are shown on the Driverless AI UI. Details of each artifact are described below.

Experiment Autoreport¶

A report file (AutoDoc) is included in the experiment summary. This report provides insight into the training data and any detected shifts in distribution, the validation schema selected, model parameter tuning, feature evolution and the final set of features chosen during the experiment.

report.docx: the report available in Word format

Click here to download and view a sample experiment report in Word format.

Generating Multiple AutoReports¶

Administrators can configure Driverless AI to generate multiple AutoReports for every experiment with the autodoc_template config.toml setting. For example, if you have a custom AutoDoc template, you may want to generate both the standard and custom AutoDoc for every experiment. When generating multiple AutoReports, the AutoDoc templates must all be of the same file format (that is, either all .docx or .md output).

Users can also generate multiple AutoReports per experiment:

Note: The following code is intended to be used by regular users. It is not intended to be used by administrators. The administrator should only update the config.toml file so that all experiments are effected.

Select the experiment you want to generate multiple AutoReports for and navigate to the expert settings panel.
Locate the Add to config.toml via toml String expert setting. Enter the following:

autodoc_template=["report_template.docx",custom_template.docx"]

On the Completed Experiment page, click Download AutoReport to download a .zip file containing multiple AutoReports.

BYOR Recipes with Autoreport¶

The experiment Autoreport supports experiments that use custom scorers, transformers, or models. Custom scorers and transformers are documented the same as Driverless AI scorers and transformers. If Driverless AI used a custom transformer, is is included in the Feature Transformations table under its display name; otherwise it is only included in the Feature Evolution section. (Note: custom-transformer descriptions are currently shown as “None” in this section.) For custom models, the standard performance metrics and plots are included; however, information that Driverless AI cannot access is not included, or is shown as “custom”, “unavailable”, or “auto.” For example, in the Model Tuning table, the booster is listed as “custom”, and in the Alternative Models section, the model package documentation is listed as “unavailable.”

Autoreport Support¶

Autoreport only supports resumed experiments for certain Driverless AI versions. See the following table to check the types of resumed experiments that are supported for your version:

Autoreport Support for Resumed Experiments Via

LTS

1.7.0 and older

1.7.1

1.8.x

New model with same parameters

yes

yes

yes

yes

Restart from last checkpoint

no

no

yes

yes

Retrain final pipeline

no

no

no

yes

Notes:

Autoreport does not support experiments that were built off of previously aborted or failed experiments.
Reports for unsupported resumed experiments will still build, but they will only include the following text: “AutoDoc not yet supported for resumed experiments.”

Experiment Artifacts Overview¶

The Experiment Summary contains artifacts that provide overviews of the experiment.

preview.txt: Provides a preview of the experiment. (This is the same information that was included on the UI before starting the experiment.)
summary: Provides the same summary that appears in the lower-right portion of the UI for the experiment. (Available in txt or json.)
config.json: Provides a list of the settings used in the experiment.
config_overrides_toml_string.txt: Provides any overrides for this experiment that were made to the config.toml file.
args_do_auto_dl.json: The internal arguments used in the Driverless AI experiment based on the dataset and accuracy, time and interpretability settings.
experiment_column_types.json: Provides the column types for each column included in the experiment.
experiment_original_column.json: A list of all columns available in the dataset that was used in the experiment.
experiment_pipeline_original_required_columns.json: For columns used in the experiment, this includes the column name and type.
experiment_sampling_description.json: A description of the sampling performed on the dataset.
timing.json: The timing and number of models generated in each part of the Driverless AI pipeline.

Tuning Artifacts¶

During the Driverless AI experiment, model tuning is performed to determined the optimal algorithm and parameter settings for the provided dataset. For regression problems, target tuning is also performed to determine the best way to represent the target column (i.e. does taking the log of the target column improve results). The results from these tuning steps are available in the Experiment Summary.

tuning_leaderboard: A table of the model tuning performed along with the score generated from the model and training time. (Available in txt or json.)
target_transform_tuning_leaderboard.txt: A table of the transforms applied to the target column along with the score generated from the model and training time. (This will be empty for binary and multiclass use cases.)

Features Artifacts¶

Driverless AI performs feature engineering on the dataset to determine the optimal representation of the data. The top features used in the final model can be seen in the GUI. The complete list of features used in the final model is available in the Experiment Summary artifacts.

The Experiment Summary also provides a list of the original features and their estimated feature importance. For example, given the features in the final Driverless AI model, we can estimate the feature importance of the original features.

Feature

Feature Importance

NumToCatWoE:PAY_AMT2

1

PAY_3

0.92

ClusterDist9:BILL_AMT1:LIMIT_BAL:PAY_3

0.90

To calculate the feature importance of PAY_3, we can aggregate the feature importance for all variables that used PAY_3:

NumToCatWoE:PAY_AMT2: 1 * 0 (PAY_3 not used.)
PAY_3: 0.92 * 1 (PAY_3 is the only variable used.)
ClusterDist9:BILL_AMT1:LIMIT_BAL:PAY_3: 0.90 * 1/3 (PAY_3 is one of three variables used.)

Estimated Feature Importance = (1*0) + (0.92*1) + (0.9*(1/3)) = 1.22

Note: The feature importance is converted to relative feature importance. (The feature with the highest estimated feature importance will have a relative feature importance of 1).

ensemble_features: A list of features used in the final model, a description of the feature, and the relative feature importance. Feature importances for multiple models are linearly blended with same weights as the final ensemble of models. (Available in txt, table, or json.)
ensemble_features_orig: A complete list of all original features used in the final model, a description of the feature, the relative feature importance, and the standard deviation of relative importance. (Available in txt or json.)
ensemble_features_orig_shift: A list of original user features used in the final model and the difference in relative feature importance between the final model and the corresponding feature importance of the final population. (Available in txt or json.)
ensemble_features_prefit: A list of features used by the best individuals in the final population, each model blended with same weights as ensemble if ensemble used blending. (Available in txt, table, or json.)
ensemble_features_shift: A list of features used in the final model and the difference in relative feature importance between the final model and the corresponding feature importance of the final population. (Available in txt, table, or json.)
features: A list of features used by the best individual pipeline (identified by the genetic algorithm) and each feature’s relative importance. (Available in txt, table, or json.)
features_orig: A list of original user features used by the best individual pipeline (identified by the genetic algorithm) and each feature’s estimated relative importance. (Available in txt or json.)
leaked_features.json: A list of all leaked features provided along with the relative importance and the standard deviation of relative importance. (Available in txt, table, or json.)
leakage_features_orig.json: A list of leaked original features provided and an estimate of the relative feature importance of that leaked original feature in the final model.
shift_features.json: A list of all features provided along with the relative importance and the shift in standard deviation of relative importance of that feature.
shifit_features_orig.json: A list of original features provided and an estimate of the shift in relative feature importance of that original feature in the final model.

Final Model Artifacts¶

The Experiment Summary includes artifacts that describe the final model. This is the model that is used to score new datasets and create the MOJO scoring pipeline. The final model may be an ensemble of models depending on the Accuracy setting.

coefs: A list of coefficients and standard deviation of coefficients for features. (Available in txt or json.)
ensemble.txt: A summary of the final model which includes a description of the model(s), gains/lifts table, confusion matrix, and scores of the final model for our list of scorers.
ensemble_base_learner_fold_scores: The internal validation scorer metrics for each base learner when the final model is an ensemble. (Available in table or json.) Note that this is not available for Time Series experiments.
ensemble_description.txt: A sentence describing the final model. (For example: “Final TensorFlowModel pipeline with ensemble_level=0 transforming 21 original features -> 54 features in each of 1 models each fit on full training data (i.e. no hold-out).”)
ensemble_coefs: The coefficent and standard deviation coefficient for each feature in the ensemble. (Available as txt or json.)
ensemble_coefs_shift: The coefficient and shift of coefficient for each feature in the ensemble. (Avalable as txt or json.)
ensemble_model_description.json/ensemble_model_extra_description: A json file describing the model(s) and for ensembles how the model predictions are weighted.
ensemble_model_params.json: A json file decribing the parameters of the model(s).
ensemble_folds_data.json: A json file describing the folds used for the final model(s). This includes the size of each fold of data and the performance of the final model on each fold. (Available if a fold column was specified.)
ensemble_features_orig: A list of the original features provided and an estimate of the relative feature importance of that original feature in the ensemble of models. (Available in txt or json.)
ensemble_features: A complete list of all features used in the final ensemble of models, a description of the feature, and the relative feature importance. (Available in txt, table, or json.)
leakage_coefs.json: A list of coefficients and standard deviation of coefficients for leaked features.
pipeline: A visual representation of the experiment pipeline.
shift_coefs.json: A list of coefficients and the shift in standard deviation for those coefficients used in the experiment.

The Experiment Summary also includes artifacts about the final model performance.

ensemble_scores.json: The scores of the final model for our list of scorers.
ensemble_confusion_matrix_test: The confusion matrix for the test data if test data is provided. Note that this is not available for Time Series experiments.
ensemble_confusion_matrix_with_validation: The confusion matrix for the internal validation data. Note that this is not available for Time Series experiments.
ensemble_confusion_matrix_stats_validation: The confusion matrix statistics on internal validation data. Note that this is not available for Time Series experiments.
ensemble_confusion_matrix_stats_test.json: Confusion matrix statistics on the test data. This is only available if test data is provided. Note that this is not available for Time Series experiments.
ensemble_gains_test: The lift and gains table for test data if test data is provided. (Visualization of lift and gains can be seen in the UI.) Note that this is not available for Time Series experiments.
ensemble_gains_with_validation: The lift and gains table for the internal validation data. (Visualization of lift and gains can be seen in the UI.) Note that this is not available for Time Series experiments.
ensemble_roc_test: The ROC and Precision Recall table for test data if test data is provided. (Visualization of ROC and Precision Recall curve can be seen in the UI.) Note that this is not available for Time Series experiments.
ensemble_roc_with_validation: The ROC and Precision Recall table for the internal validation data. (Visualization of ROC and Precision Recall curve can be seen in the UI.) Note that this is not available for Time Series experiments.
fs_normalized_varimp: The normalized frequency variable importance values. (Available in table or json.) Note that this is not available for Time Series experiments.
fs_unnormalized_varimp: The unnormalized frequency variable importance values. (Available in table or json.) Note that this is not available for Time Series experiments.
individual_scored.params_base: Detailed information about each iteration run in the experiment. (Available in csv, table, or json.)