Experiment Documentation Settings

The EXPERIMENT DOCUMENTATION tab in EXPERT SETTINGS configures AutoDoc report generation, model performance analysis, and model interpretation features. This tab allows you to customize how Driverless AI creates comprehensive documentation, analyzes model performance, and provides interpretability insights for your machine learning experiments.

Experiment Documentation Settings UI

Experiment Documentation Tab Sub-Categories

The EXPERIMENT DOCUMENTATION tab is organized into five main sub-categories, each focusing on specific aspects of the documentation process during your machine learning experiments.

To access these settings, navigate to EXPERT SETTINGS > EXPERIMENT DOCUMENTATION tab from the Experiment Setup page.

The following table describes the actions you can take from the EXPERIMENT DOCUMENTATION page:

Experiment Documentation Tab Sub-Categories

Sub-Category

Description

[1] General

Configure AutoDoc report generation, file naming, template selection, and output format preferences

[2] Data

Configure data analysis documentation, summary tables, histograms, and transformation recommendations

[3] Models

Configure model architecture documentation, feature importance display, and GLM coefficient tables

[4] Model Performance

Configure performance metrics, validation statistics, confusion matrices, and stability analysis

[5] Interpretation

Configure model interpretability features, Shapley values, PDP plots, and feature importance analysis

[6] Filter by Tags

Filter and organize documentation settings using custom tags and labels

[7] Save

Save all documentation configuration changes

[8] Cancel

Cancel changes and revert to previous configuration settings

General

The General sub-tab configures core AutoDoc settings, report generation preferences, and general documentation options for your experiments.

General Experiment Documentation Settings

Common Settings:

  • Make AutoDoc
    • What it does: Controls whether to generate AutoDoc reports after the experiment completes

    • Purpose: Creates comprehensive documentation including model performance, data analysis, and interpretability insights

    • Available options:
      • Enabled: Generates AutoDoc report (default)

      • Disabled: Skips AutoDoc generation

    • Requires: Boolean toggle (default: Enabled)

  • AutoDoc Name
    • What it does: Sets the name for the AutoDoc report file

    • Purpose: Customizes the output filename for the generated documentation

    • Default value: “report”

    • File extension: Added automatically based on output type setting

    • Requires: String value (default: “report”)

  • AutoDoc Template Location
    • What it does: Specifies the file path to a custom AutoDoc template

    • Purpose: Allows use of custom templates for personalized report generation

    • Template format: Supports custom template files for tailored documentation

    • Empty value: Uses the default Driverless AI template

    • Requires: String path (default: “”)

  • AutoDoc File Output Type
    • What it does: Controls the output format for the AutoDoc report

    • Purpose: Determines the file format of the generated documentation

    • Available options:
      • DOCX: Microsoft Word document format (default)

      • MD: Markdown format for text-based documentation

    • Requires: String selection (default: “DOCX”, options: “DOCX”, “MD”)

Advanced Settings:

  • AutoDoc SubTemplate Type
    • What it does: Specifies the sub-template type for AutoDoc generation

    • Purpose: Controls the specific template variant used for report generation

    • Available options:
      • AUTO: Automatically selects the appropriate sub-template (default)

      • MD: Forces markdown sub-template

      • DOCX: Forces Word document sub-template

    • Requires: String selection (default: “AUTO”, options: “AUTO”, “MD”, “DOCX”)

  • List All Config Settings
    • What it does: Controls whether to include all configuration settings in the AutoDoc report

    • Purpose: Determines if the complete experiment configuration is documented

    • Available options:
      • Enabled: Includes all configuration settings in documentation

      • Disabled: Excludes configuration settings from documentation (default)

    • Use case: Useful for sharing experiment details with team members

    • Requires: Boolean toggle (default: Disabled)

Data

The Data sub-tab manages data analysis documentation, dataset quality reporting, and data exploration settings for your experiments.

Data Experiment Documentation Settings

Advanced Settings:

  • Number of Features in Data Summary Table
    • What it does: Sets the number of features to include in the data summary table (-1 = all features)

    • Purpose: Controls the size and scope of the data summary section in AutoDoc reports

    • Auto mode (-1): Includes all features in the summary table

    • Requires: Integer value (default: -1)

  • Number of Histograms to Show
    • What it does: Sets the maximum number of histograms to display in the data analysis section

    • Purpose: Controls the visual complexity of the data documentation

    • Performance: Limits the number of histogram plots to avoid overwhelming reports

    • Requires: Integer value (default: 10)

  • Autoviz Recommended Transformations
    • What it does: Specifies recommended data transformations for automatic visualization

    • Purpose: Customizes the data transformation suggestions in AutoDoc reports

    • Format: Dictionary of transformation recommendations

    • Empty value: Uses default transformation recommendations

    • Requires: Dictionary format (default: {})

Models

The Models sub-tab configures model documentation, performance metrics display, and model comparison settings for your experiments.

Models Experiment Documentation Settings

Advanced Settings:

  • Number of Top Features to Document
    • What it does: Sets the number of top-performing features to include in model documentation

    • Purpose: Controls the depth of feature importance documentation in AutoDoc reports

    • Focus: Highlights the most significant features for model interpretability

    • Requires: Integer value (default: 50)

  • Minimum Relative Feature Importance Threshold
    • What it does: Sets the minimum threshold for feature importance to be included in documentation

    • Purpose: Filters out low-importance features to focus on meaningful contributors

    • Quality control: Ensures only relevant features are documented

    • Requires: Float value (default: 0.003)

  • Keras Model Architecture Summary Line Length
    • What it does: Controls the line length for Keras model architecture summaries (-1 = no limit)

    • Purpose: Formats neural network architecture documentation for readability

    • Auto mode (-1): Uses unlimited line length

    • Requires: Integer value (default: -1)

  • NLP/Image Transformer Architecture Max Lines
    • What it does: Sets the maximum number of lines to display for transformer architecture summaries

    • Purpose: Limits the size of complex model architecture documentation

    • Performance: Prevents overly verbose transformer documentation

    • Requires: Integer value (default: 30)

  • Appendix NLP/Image Transformer Architecture
    • What it does: Controls whether to include detailed transformer architecture in the appendix

    • Purpose: Provides comprehensive model architecture documentation

    • Available options:
      • Enabled: Includes detailed transformer architecture in appendix

      • Disabled: Excludes transformer architecture from appendix (default)

    • Requires: Boolean toggle (default: Disabled)

  • Full GLM Coefficients Table in the Appendix
    • What it does: Controls whether to include complete GLM coefficient tables in the appendix

    • Purpose: Provides detailed statistical information for Generalized Linear Models

    • Available options:
      • Enabled: Includes full coefficient tables in appendix

      • Disabled: Excludes full coefficient tables from appendix (default)

    • Requires: Boolean toggle (default: Disabled)

  • GLM Coefficient Tables Number of Models
    • What it does: Sets the number of GLM models to include in coefficient tables

    • Purpose: Controls the scope of GLM coefficient documentation

    • Use case: Useful for ensemble models with multiple GLM components

    • Requires: Integer value (default: 1)

  • GLM Coefficient Tables Number of Folds Per Model
    • What it does: Sets the number of cross-validation folds per model in coefficient tables (-1 = all folds)

    • Purpose: Controls the depth of cross-validation coefficient documentation

    • Auto mode (-1): Includes all available folds

    • Requires: Integer value (default: -1)

  • GLM Coefficient Tables Number of Coefficients
    • What it does: Sets the maximum number of coefficients to display in GLM tables

    • Purpose: Limits the size of coefficient documentation for large models

    • Performance: Prevents overwhelming documentation with too many coefficients

    • Requires: Integer value (default: 50)

  • GLM Coefficient Tables Number of Classes
    • What it does: Sets the maximum number of classes to include in GLM coefficient tables

    • Purpose: Controls the scope of multi-class GLM documentation

    • Use case: Useful for classification problems with many classes

    • Requires: Integer value (default: 9)

Model Performance

The Model Performance sub-tab manages detailed performance analysis, validation metrics, and comprehensive performance reporting for your experiments.

Model Performance Experiment Documentation Settings

Common Settings:

  • Population Stability Index
    • What it does: Controls whether to include Population Stability Index (PSI) analysis in performance documentation

    • Purpose: Measures the stability of model predictions between training and validation data

    • Available options:
      • Enabled: Includes PSI analysis in AutoDoc reports

      • Disabled: Excludes PSI analysis from AutoDoc reports (default)

    • Requires: Boolean toggle (default: Disabled)

  • Prediction Statistics
    • What it does: Controls whether to include detailed prediction statistics in performance documentation

    • Purpose: Provides comprehensive statistical analysis of model predictions

    • Available options:
      • Enabled: Includes prediction statistics in AutoDoc reports

      • Disabled: Excludes prediction statistics from AutoDoc reports (default)

    • Requires: Boolean toggle (default: Disabled)

  • Response Rates Plot
    • What it does: Controls whether to include response rate visualization plots in performance documentation

    • Purpose: Visualizes the relationship between model predictions and actual outcomes

    • Available options:
      • Enabled: Includes response rate plots in AutoDoc reports

      • Disabled: Excludes response rate plots from AutoDoc reports (default)

    • Requires: Boolean toggle (default: Disabled)

  • Show GINI Plot
    • What it does: Controls whether to include GINI coefficient visualization in performance documentation

    • Purpose: Provides visual representation of model discrimination power

    • Available options:
      • Enabled: Includes GINI plots in AutoDoc reports

      • Disabled: Excludes GINI plots from AutoDoc reports (default)

    • Requires: Boolean toggle (default: Disabled)

Advanced Settings:

  • Confusion Matrix Max Number of Classes
    • What it does: Sets the maximum number of classes to display in confusion matrix documentation

    • Purpose: Controls the complexity of confusion matrix visualization

    • Performance: Prevents overwhelming documentation with too many classes

    • Requires: Integer value (default: 10)

  • Population Stability Index Number of Quantiles
    • What it does: Sets the number of quantiles used for PSI calculation and documentation

    • Purpose: Controls the granularity of PSI analysis

    • Accuracy: More quantiles provide more detailed PSI analysis

    • Requires: Integer value (default: 10)

  • Prediction Statistics Number of Quantiles
    • What it does: Sets the number of quantiles for prediction statistics analysis

    • Purpose: Controls the granularity of prediction distribution analysis

    • Detail: More quantiles provide more detailed statistical insights

    • Requires: Integer value (default: 20)

  • Response Rate Plot Number of Quantiles
    • What it does: Sets the number of quantiles for response rate plot visualization

    • Purpose: Controls the granularity of response rate analysis

    • Visualization: More quantiles create smoother response rate curves

    • Requires: Integer value (default: 10)

Interpretation

The Interpretation sub-tab configures model interpretability features, feature importance analysis, and explanation generation settings for your experiments.

Interpretation Experiment Documentation Settings

Common Settings:

  • Enable Shapley Values
    • What it does: Controls whether to include Shapley value explanations in interpretability documentation

    • Purpose: Provides detailed feature contribution analysis for model predictions

    • Available options:
      • Enabled: Includes Shapley value analysis in AutoDoc reports

      • Disabled: Excludes Shapley value analysis from AutoDoc reports (default)

    • Requires: Boolean toggle (default: Disabled)

Advanced Settings:

  • fast_approx_num_trees
    • What it does: Sets the number of trees to use for fast Shapley value approximation

    • Purpose: Controls the speed vs. accuracy trade-off for Shapley calculations

    • Performance: More trees provide more accurate approximations but slower computation

    • Requires: Integer value (default: 250)

  • fast_approx_do_one_fold
    • What it does: Controls whether to use single-fold approximation for fast Shapley calculations

    • Purpose: Optimizes computation speed for Shapley value generation

    • Available options:
      • Enabled: Uses single-fold approximation for faster computation

      • Disabled: Uses multi-fold approximation for better accuracy

    • Requires: Boolean toggle (default: Enabled)

  • fast_approx_do_one_model
    • What it does: Controls whether to use single-model approximation for fast Shapley calculations

    • Purpose: Optimizes computation speed by using only one model for approximation

    • Available options:
      • Enabled: Uses single-model approximation for faster computation

      • Disabled: Uses multiple models for better accuracy

    • Requires: Boolean toggle (default: Enabled)

  • fast_approx_contribs_num_trees
    • What it does: Sets the number of trees for fast contribution approximation

    • Purpose: Controls the accuracy of feature contribution calculations

    • Balance: Balances computation speed with approximation quality

    • Requires: Integer value (default: 50)

  • fast_approx_contribs_do_one_fold
    • What it does: Controls whether to use single-fold approximation for contribution calculations

    • Purpose: Optimizes computation speed for feature contribution analysis

    • Available options:
      • Enabled: Uses single-fold approximation for faster computation

      • Disabled: Uses multi-fold approximation for better accuracy

    • Requires: Boolean toggle (default: Enabled)

  • fast_approx_contribs_do_one_model
    • What it does: Controls whether to use single-model approximation for contribution calculations

    • Purpose: Optimizes computation speed for contribution analysis

    • Available options:
      • Enabled: Uses single-model approximation for faster computation

      • Disabled: Uses multiple models for better accuracy

    • Requires: Boolean toggle (default: Enabled)

  • Permutation Feature Importance
    • What it does: Controls whether to include permutation-based feature importance analysis

    • Purpose: Provides alternative feature importance measurement method

    • Available options:
      • Enabled: Includes permutation feature importance in AutoDoc reports

      • Disabled: Excludes permutation feature importance from AutoDoc reports (default)

    • Requires: Boolean toggle (default: Disabled)

  • Number of Permutations for Feature Importance
    • What it does: Sets the number of permutations to use for feature importance calculation

    • Purpose: Controls the accuracy and stability of permutation importance scores

    • Accuracy: More permutations provide more stable importance estimates

    • Requires: Integer value (default: 1)

  • Feature Importance Scorer
    • What it does: Specifies the scoring metric to use for feature importance calculations

    • Purpose: Determines how feature importance is measured and ranked

    • Options: Supports various scoring metrics (accuracy, AUC, R2, etc.)

    • Requires: String value (default: “auto”)

  • PDP and Shapley Summary Plot Max Rows
    • What it does: Sets the maximum number of rows to include in PDP and Shapley summary plots

    • Purpose: Controls the size and readability of interpretability visualizations

    • Performance: Limits plot complexity for better performance

    • Requires: Integer value (default: 10000)

  • PDP max runtime in seconds
    • What it does: Sets the maximum runtime for Partial Dependence Plot calculations

    • Purpose: Prevents PDP calculations from taking too long on large datasets

    • Timeout: Automatically stops PDP generation if time limit is exceeded

    • Requires: Integer value (default: 45)

  • Use fast approximation for PDP
    • What it does: Controls whether to use fast approximation methods for PDP generation

    • Purpose: Optimizes computation speed for Partial Dependence Plots

    • Available options:
      • Enabled: Uses fast approximation for quicker PDP generation

      • Disabled: Uses standard PDP calculation for better accuracy

    • Requires: Boolean toggle (default: Enabled)

  • PDP Max. number of unique values for int/float to be categoricals
    • What it does: Sets the threshold for treating integer/float features as categorical in PDP

    • Purpose: Controls how numeric features are handled in partial dependence analysis

    • Threshold: Features with fewer unique values are treated as categorical

    • Requires: Integer value (default: 50)

  • PDP Out of Range
    • What it does: Sets the number of out-of-range values to include in PDP analysis

    • Purpose: Controls the range of values analyzed in partial dependence plots

    • Coverage: Includes values beyond the training data range for comprehensive analysis

    • Requires: Integer value (default: 3)

  • ICE Number of Rows
    • What it does: Sets the number of rows to include in Individual Conditional Expectation plots

    • Purpose: Controls the granularity of ICE plot analysis

    • Auto mode (0): Automatically determines optimal number of rows

    • Requires: Integer value (default: 0)

  • Global KLIME Number of Features
    • What it does: Sets the number of features to include in global KLIME analysis

    • Purpose: Controls the scope of KLIME interpretability analysis

    • Focus: Highlights the most important features in global explanations

    • Requires: Integer value (default: 10)

  • Global KLIME Number of Tables
    • What it does: Sets the number of KLIME tables to generate for global analysis

    • Purpose: Controls the depth of KLIME interpretability documentation

    • Comprehensiveness: More tables provide more detailed explanations

    • Available options:
      • 1: Shows one table with coefficients sorted by absolute value

      • 2: Shows two tables - one with top positive coefficients and one with top negative coefficients

    • Requires: Integer value (default: auto)