Interpretation Expert Settings¶
The following is a list of the Interpretation expert settings that are available when setting up a new interpretation from the MLI page. The name of each setting is preceded by its config.toml label. For info on explainer-specific expert settings, see Explainer (Recipes) Expert Settings.
MLI Tab¶
mli_sample
¶
Sample All Explainers
Specify whether to perform the interpretation on a sample of the training data. By default, MLI will sample the training dataset if it is greater than 100k rows. (The equivalent config.toml setting is mli_sample_size
.) This is enabled by default. Turn this toggle off to run MLI on the entire dataset.
mli_enable_mojo_scorer
¶
Allow Use of MOJO Scoring Pipeline
Use this option to disable MOJO scoring pipeline. Scoring pipeline is chosen automatically (from MOJO and Python pipelines) by default. In case of certain models, MOJO vs. Python choice can impact pipeline performance and robustness.
mli_fast_approx
¶
Speed up predictions with a fast approximation
Specify whether to speed up predictions with a fast approximation. When enabled, this setting can reduce the number of trees or cross-validation folds and ultimately reduce the time needed to complete interpretations. This setting is enabled by default.
mli_custom
¶
Add to config.toml via TOML String
Use this input field to add to the Driverless AI server config.toml configuration file with TOML string.
MLI NLP Tab¶
mli_nlp_top_n
¶
Number of Tokens Used for MLI NLP Explanations
Specify the number of tokens used for MLI NLP explanations. To use all available tokens, set this value to -1. By default, this value is set to 20.
mli_nlp_sample_limit
¶
Sample Size for NLP Surrogate Models
Specify the maximum number of records used by MLI NLP explainers. The default value is 10000.
mli_nlp_min_df
¶
Minimum Number of Documents in Which Token Has to Appear
Specify the minimum number of documents in which token has to appear. Use integer values to denote absolute counts and floating-point values to denote percentages. By default, this value is set to 3.
mli_nlp_max_df
¶
Maximum Number of Documents in Which Token Has to Appear
Specify the maximum number of documents in which token has to appear. Use integer values to denote absolute counts and floating-point values to denote percentages. By default, this value is set to 3.
mli_nlp_min_ngram
¶
Minimum Value in n-gram Range
Specify the minimum value in the n-gram range. The tokenizer generates all possible tokens in the range specified by mli_nlp_min_ngram
and mli_nlp_max_ngram
. By default, this value is set to 1.
mli_nlp_max_ngram
¶
Maximum Value in n-gram Range
Specify the maximum value in the n-gram range. The tokenizer generates all possible tokens in the range specified by mli_nlp_min_ngram
and mli_nlp_max_ngram
. By default, this value is set to 1.
mli_nlp_min_token_mode
¶
Mode Used to Choose N Tokens for MLI NLP
Specify the mode used to choose N tokens. Select from the following:
top - Chooses N top tokens
bottom - Chooses N bottom tokens
top-bottom - Chooses math.floor (N/2) top and math.ceil (N/2) bottom tokens
linspace - Chooses N evenly spaced tokens
mli_nlp_tokenizer_max_features
¶
Number of Top Tokens to Use as Features (Token-based Feature Importance)
Specify the number of top tokens to use as features when building token-based feature importance. By default, this value is set to -1.
mli_nlp_loco_max_features
¶
Number of Top Tokens to Use as Features (LOCO)
Specify the number of top tokens to use as features when computing text LOCO. By default, this value is set to -1.
mli_nlp_surrogate_tokens
¶
Number of Top Tokens to Use as Features (Surrogate Model)
Specify the number of top tokens to use as features when building surrogate models. Note that this setting only applies to NLP models. By default, this value is set to 100.
mli_nlp_use_stop_words
¶
Stop Words for MLI NLP
Specify whether to use stop words for MLI NLP. This setting is enabled by default.
mli_nlp_stop_words
¶
List of Words to Filter Before Generating Text Tokens
Specify a list of words to filter out before generating text tokens, which are passed to MLI NLP LOCO and surrogate models (if enabled). Enter a custom list of stop words. For example, you can enter ['great', 'good']
to filter out the words great and good.
mli_nlp_append_to_english_stop_words
¶
Append List of Custom Stop Words to Default Stop Words
Specify whether to append the list of stop words specified by mli_nlp_stop_words
to the default list of stop words. This setting is disabled by default.
MLI Surrogate Models Tab¶
mli_lime_method
¶
LIME Method
Select a LIME method of either K-LIME (default) or LIME-SUP.
K-LIME (default): creates one global surrogate GLM on the entire training data and also creates numerous local surrogate GLMs on samples formed from k-means clusters in the training data. The features used for k-means are selected from the Random Forest surrogate model’s variable importance. The number of features used for k-means is the minimum of the top 25% of variables from the Random Forest surrogate model’s variable importance and the max number of variables that can be used for k-means, which is set by the user in the config.toml setting for
mli_max_number_cluster_vars
. (Note, if the number of features in the dataset are less than or equal to 6, then all features are used for k-means clustering.) The previous setting can be turned off to use all features for k-means by settinguse_all_columns_klime_kmeans
in the config.toml file totrue
. All penalized GLM surrogates are trained to model the predictions of the Driverless AI model. The number of clusters for local explanations is chosen by a grid search in which the \(R2\) between the Driverless AI model predictions and all of the local K-LIME model predictions is maximized. The global and local linear model’s intercepts, coefficients, \(R2\) values, accuracy, and predictions can all be used to debug and develop explanations for the Driverless AI model’s behavior.LIME-SUP: explains local regions of the trained Driverless AI model in terms of the original variables. Local regions are defined by each leaf node path of the decision tree surrogate model instead of simulated, perturbed observation samples - as in the original LIME. For each local region, a local GLM model is trained on the original inputs and the predictions of the Driverless AI model. Then the parameters of this local GLM can be used to generate approximate, local explanations of the Driverless AI model.
mli_use_raw_features
¶
Use Original Features for Surrogate Models
:open:
Specify whether to use original features or transformed features in the surrogate model for the new interpretation. This is enabled by default.
Note: When this setting is disabled, the K-LIME clustering column and quantile binning options are unavailable.
mli_vars_to_pdp
¶
Number of Features for Partial Dependence Plot
Specify the maximum number of features to use when building the Partial Dependence Plot. Use -1 to calculate Partial Dependence Plot for all features. By default, this value is set to 10.
mli_nfolds
¶
Cross-validation Folds for Surrogate Models
Specify the number of surrogate cross-validation folds to use (from 0 to 10). When running experiments, Driverless AI automatically splits the training data and uses the validation data to determine the performance of the model parameter tuning and feature engineering steps. For a new interpretation, Driverless AI uses 3 cross-validation folds by default for the interpretation.
mli_qbin_count
¶
Number of Columns to Bin for Surrogate Models
Specify the number of columns to bin for surrogate models. This value defaults to 0.
mli_sample_size
¶
Sample Size for Surrogate Models
When the number of rows is above this limit, sample for surrogate models. The default value is 100000.
mli_num_quantiles
¶
Number of Bins for Quantile Binning
Specify the number of bins for quantile binning. By default, this value is set to -10.
mli_dia_sample_size
¶
Sample Size for Disparate Impact Analysis
When the number of rows is above this limit, sample for Disparate Impact Analysis (DIA). The default value is 100000.
mli_pd_sample_size
¶
Sample Size for Partial Dependence Plot
When number of rows is above this limit, sample for the Driverless AI partial dependence plot. The default value is 25000.
mli_pd_numcat_num_chart
¶
Unique Feature Values Count Driven Partial Dependence Plot Binning and Chart Selection
Specify whether to use dynamic switching between PDP numeric and categorical binning and UI chart selection in cases where features were used both as numeric and categorical by the experiment. This is enabled by default.
mli_pd_numcat_threshold
¶
Threshold for PD/ICE Binning and Chart Selection
If mli_pd_numcat_num_chart
is enabled, and if the number of unique feature values is greater than the threshold, then numeric binning and chart is used. Otherwise, categorical binning and chart is used. The default threshold value is 11.
mli_sa_sampling_limit
¶
Sample Size for Sensitivity Analysis (SA)
When the number of rows is above this limit, sample for Sensitivity Analysis (SA). The default value is 500000.
klime_cluster_col
¶
k-LIME Clustering Columns
For k-LIME interpretations, optionally specify which columns to have k-LIME clustering applied to.
Note: This setting is not found in the config.toml file.
qbin_cols
¶
Quantile Binning Columns
For k-LIME interpretations, specify one or more columns to generate decile bins (uniform distribution) to help with MLI accuracy. Columns selected are added to top n columns for quantile binning selection. If a column is not numeric or not in the dataset (transformed features), then the column will be skipped.
Note: This setting is not found in the config.toml file.