Run specific explainer¶
Initialize a client with your server credentials and store it in the variable dai
.
import driverlessai
dai = driverlessai.Client(address='http://localhost:12345', username='py', password='py')
Select and Configure Explainers¶
List all available explainers on the server.
dai.recipes.explainers.list()
| Type | Key | Name ----+-----------------+-------+------------------------------------------------------------------- 0 | ExplainerRecipe | | Absolute Permutation Feature Importance 1 | ExplainerRecipe | | AutoDoc 2 | ExplainerRecipe | | Disparate Impact Analysis 3 | ExplainerRecipe | | Interpretability Data Zip (Surrogate and Shapley Techniques) 4 | ExplainerRecipe | | NLP Leave-one-covariate-out (LOCO) 5 | ExplainerRecipe | | NLP Partial Dependence Plot 6 | ExplainerRecipe | | NLP Tokenizer 7 | ExplainerRecipe | | NLP Vectorizer + Linear Model (VLM) Text Feature Importance 8 | ExplainerRecipe | | Original Feature Importance 9 | ExplainerRecipe | | Partial Dependence Plot 10 | ExplainerRecipe | | Relative Permutation Feature Importance 11 | ExplainerRecipe | | Sensitivity Analysis 12 | ExplainerRecipe | | Shapley Summary Plot for Original Features (Naive Shapley Method) 13 | ExplainerRecipe | | Shapley Values for Original Features (Kernel SHAP Method) 14 | ExplainerRecipe | | Shapley Values for Original Features (Naive Method) 15 | ExplainerRecipe | | Shapley Values for Transformed Features 16 | ExplainerRecipe | | Surrogate Decision Tree 17 | ExplainerRecipe | | Surrogate Random Forest Feature Importance 18 | ExplainerRecipe | | Surrogate Random Forest Leave-one-covariate-out (LOCO) 19 | ExplainerRecipe | | Surrogate Random Forest Partial Dependence Plot 20 | ExplainerRecipe | | Transformed Feature Importance 21 | ExplainerRecipe | | k-LIME/LIME-SUP 22 | ExplainerRecipe | | Time series explainer
Select specific explainers and save them for interpretation¶
Select the 'Disparate Impact Analysis' explainer located at index 2.
dia = dai.recipes.explainers.list()[2]
print(dia)
Disparate Impact Analysis
Select the 'Shapley Values for Original Features' explainer located at index 14.
orig_shap = dai.recipes.explainers.list()[14]
print(orig_shap)
Shapley Values for Original Features (Naive Method)
Select the 'Surrogate Random Forest Leave-one-covariate-out (LOCO)' explainer located at index 18.
rf_loco = dai.recipes.explainers.list()[18]
print(rf_loco)
Surrogate Random Forest Leave-one-covariate-out (LOCO)
Select the 'Surrogate Decision Tree' explainer located at index 16.
dt = dai.recipes.explainers.list()[16]
print(dt)
Surrogate Decision Tree
Explainer Expert Settings¶
You can change the settings of some explainers.
View all of the available settings that you can modify on the 'Shapley Values for Original Features' explainer:
orig_shap.search_settings(show_valid_values=True, show_description=True)
Name | Default Value | Valid Values | Description |
---|---|---|---|
sample_size | 100000 | An integer | Sample size for Naive Shapley. |
fast_approx_contribs | True | A boolean | Speed up predictions with fast contributions predictions approximation. |
To modify the settings, you can use the with_settings
method. View the docstring for an explanation of how this method works.
orig_shap.with_settings?
Signature: orig_shap.with_settings( validate_value: bool = True, *, sample_size: int = 100000, fast_approx_contribs: bool = True, ) Docstring: Changes the explainer settings from defaults. Settings reset to defaults everytime this is called. Args: validate_value: Enable value validation. Default is True. .. note:: To search possible explainer settings for your server version, use ``explainer.search_settings(search_term)``. File: /usr/local/Caskroom/miniconda/base/envs/dai-py/lib/python3.7/site-packages/driverlessai/_recipes.py Type: method
By default, the with_settings
method performs validation on the value for the explainer settings. The following demonstrates what happens when an invalid value is passed to the setting sample_size
.
# Disable traceback printing in Jupyter to minimize clutter caused by exception tracebacks.
%xmode Minimal
Exception reporting mode: Minimal
orig_shap.with_settings(sample_size="1")
ValueError: Invalid value type for 'sample_size'. Expected type integer, got string
# Revert back to printing tracebacks on exceptions
%xmode Context
Exception reporting mode: Context
You can change the sample_size to 1000 on the 'Shapley Values for Original Features' explainer.
orig_shap.with_settings(sample_size=1000)
<class 'ExplainerRecipe'> Shapley Values for Original Features (Naive Method)
Next, view the non-default settings for the explainer.
orig_shap.show_settings()
Name | Value |
---|---|
sample_size | 1000 |
On the 'Disparate Impact Analysis' explainer, modify the maximize_metric to "F2" and sample_size to 100.
dia.search_settings(show_description=True, show_valid_values=True)
Name | Default Value | Valid Values | Description |
---|---|---|---|
dia_cols | A list | List of features for which to compute DIA. | |
cut_off | 0.0 | A floating-point | Cut off. |
maximize_metric | F1 | 'F1', 'F05', 'F2' or 'MCC' | Maximize metric. |
use_holdout_preds | True | A boolean | Use internal holdout predictions. |
sample_size | 100000 | An integer | Sample size for Disparate Impact Analysis. |
max_cardinality | 10 | An integer | Max cardinality for categorical variables. |
min_cardinality | 2 | An integer | Minimum cardinality for categorical variables. |
num_card | 25 | An integer | Max cardinality for numeric variables to be considered categorical. |
fast_approx | True | A boolean | Speed up predictions with a fast approximation. |
dia.with_settings(maximize_metric="F2", sample_size=1000)
<class 'ExplainerRecipe'> Disparate Impact Analysis
dia.show_settings()
Name | Value |
---|---|
maximize_metric | F2 |
sample_size | 1000 |
On the 'Surrogate Decision Tree' explainer, modify the dt_tree_depth to 5 and categorical_encoding to "AUTO".
dt.search_settings(show_valid_values=True)
Name | Default Value | Valid Values |
---|---|---|
dt_tree_depth | 3 | An integer |
nfolds | 3 | An integer |
qbin_cols | A list | |
qbin_count | 0 | An integer |
categorical_encoding | One Hot Encoding | 'AUTO', 'One Hot Encoding', 'Enum Limited', 'Sort by Response' or 'Label Encoder' |
dt.with_settings(dt_tree_depth=5, categorical_encoding="AUTO")
<class 'ExplainerRecipe'> Surrogate Decision Tree
dt.show_settings()
Name | Value |
---|---|
dt_tree_depth | 5 |
categorical_encoding | AUTO |
Create interpretation¶
Get the most recent experiment on the server.
experiment = dai.experiments.list()[0]
View all Interpretation Expert Settings that are available for configuration.
dai.mli.search_expert_settings(show_description=True, show_valid_values=True)
Name | Default Value | Valid Values | Description |
---|---|---|---|
lime_method | k-LIME | 'k-LIME' or 'LIME-SUP' | LIME method. Choose LIME method to be used for creation of surrogate models. |
use_raw_features | True | A boolean | Use original features for surrogate models. Choose whether surrogate models should be built for original or transformed features. |
ts_use_raw_features | False | A boolean | Use original features for time series based surrogate models. Choose whether time series based surrogate models should be built for original features. |
sample | True | A boolean | Sample all explainers. Choose whether to run all explainers on the sampled dataset. |
vars_to_pdp | 10 | An integer values greater than or equal to -1. | Number of features for Surrogate Partial Dependence Plot. Set to -1 to use all features. Set maximum number of features for which to build Surrogate Partial Dependence Plot. Use -1 to calculate Surrogate Partial Dependence Plot for all features. |
nfolds | 3 | An integer value is within the range of 0 to 10. | Cross-validation folds for surrogate models. Set the number of cross-validation folds for surrogate models. |
qbin_count | 0 | An integer value is within the range of 0 to 10. | Number of columns to bin for surrogate models. Set the number of columns to bin in case of quantile binning. |
enable_mojo_scorer | True | A boolean | Allow use of MOJO scoring pipeline. Use this option to disable MOJO scoring pipeline. Scoring pipeline is chosen automatically (from MOJO and Python pipelines) by default. In case of certain models MOJO vs. Python choice can impact pipeline performance and robustness. |
sample_size | 100000 | An integer | Sample size for surrogate models. The sample size, number of rows, used for MLI surrogate models. |
num_quantiles | 10 | An integer | Number of bins for quantile binning. Number of bins for quantile binning. |
drf_num_trees | 100 | An integer | Number of trees for Random Forest surrogate model. Number of trees for Random Forest surrogate model. |
fast_approx | True | A boolean | Speed up predictions with a fast approximation. Speed up predictions with a fast approximation (can reduce the number of trees or cross-validation folds). |
drf_max_depth | 20 | An integer | Max depth for Random Forest surrogate model. Max depth for Random Forest surrogate model. |
klime_lambda | [1e-06, 1e-08] | A list | Regularization strength for k-LIME GLM's. Regularization strength for k-LIME GLM's. |
klime_alpha | 0.0 | A floating-point | Regularization distribution between L1 and L2 for k-LIME GLM's. Regularization distribution between L1 and L2 for k-LIME GLM's. |
max_numeric_enum_cardinality | 25 | An integer | Max cardinality for numeric variables in surrogate models to be considered categorical. Max cardinality for numeric variables in surrogate models to be considered categorical. |
max_number_cluster_vars | 6 | An integer | Maximum number of features allowed for k-LIME k-means clustering. Maximum number of features allowed for k-LIME k-means clustering. |
use_all_columns_klime_kmeans | False | A boolean | Use all columns for k-LIME k-means clustering (this will override `mli_max_number_cluster_vars` if set to `True`). Use all columns for k-LIME k-means clustering (this will override `mli_max_number_cluster_vars` if set to `True`). |
custom | A string | Add to config.toml via TOML string. Add TOML string to Driverless AI server config.toml configuration file. | |
nlp_top_n | 20 | An integer | Number of tokens used for MLI NLP explanations. -1 means all. Number of tokens used for MLI NLP explanations. -1 means all. |
nlp_sample_limit | 10000 | An integer | Sample size for MLI NLP explainers. Maximum number of records used by MLI NLP explainers. |
nlp_min_df | 3 | An integer | Minimum number of documents in which token has to appear. Integer mean absolute count, float means percentage. Minimum number of documents in which token has to appear. Integer mean absolute count, float means percentage. |
nlp_max_df | 0.9 | A floating-point | Maximum number of documents in which token has to appear. Integer mean absolute count, float means percentage. Maximum number of documents in which token has to appear. Integer mean absolute count, float means percentage. |
nlp_min_ngram | 1 | An integer | The minimum value in the ngram range. The tokenizer will generate all possible tokens in the (mli_nlp_min_ngram, mli_nlp_max_ngram) range. The minimum value in the ngram range. The tokenizer will generate all possible tokens in the (mli_nlp_min_ngram, mli_nlp_max_ngram) range. |
nlp_max_ngram | 1 | An integer | The maximum value in the ngram range. The tokenizer will generate all possible tokens in the (mli_nlp_min_ngram, mli_nlp_max_ngram) range. The maximum value in the ngram range. The tokenizer will generate all possible tokens in the (mli_nlp_min_ngram, mli_nlp_max_ngram) range. |
nlp_min_token_mode | top | A string | Mode used to choose N tokens for MLI NLP. "top" chooses N top tokens. "bottom" chooses N bottom tokens. "top-bottom" chooses math.floor(N/2) top and math.ceil(N/2) bottom tokens. "linspace" chooses N evenly spaced out tokens. Mode used to choose N tokens for MLI NLP. "top" chooses N top tokens. "bottom" chooses N bottom tokens. "top-bottom" chooses math.floor(N/2) top and math.ceil(N/2) bottom tokens. "linspace" chooses N evenly spaced out tokens. |
nlp_tokenizer_max_features | -1 | An integer | The number of top tokens to be used as features when building token based feature importance. The number of top tokens to be used as features when building token based feature importance. |
nlp_loco_max_features | -1 | An integer | The number of top tokens to be used as features when computing text LOCO. The number of top tokens to be used as features when computing text LOCO. |
nlp_surrogate_tokenizer | Linear Model + TF-IDF | 'Linear Model + TF-IDF' or 'TF-IDF' | Tokenizer for surrogate models. Only applies to NLP models. The tokenizer method to use when tokenizing a dataset for surrogate models. Can either choose 'TF-IDF' or 'Linear Model + TF-IDF', which first runs TF-IDF to get tokens and then fits a linear model between the tokens and the target to get importances of tokens, which are based on coefficients of the linear model. Default is 'Linear Model + TF-IDF'. Only applies to NLP models. |
nlp_surrogate_tokens | 100 | An integer | The number of top tokens to be used as features when building surrogate models. Only applies to NLP models. The number of top tokens to be used as features when building surrogate models. Only applies to NLP models. |
nlp_use_stop_words | True | A boolean | Ignore stop words for MLI NLP. Ignore stop words for MLI NLP. |
nlp_stop_words | english | A string | List of words to filter out before generation of text tokens, which are passed to MLI NLP LOCO and surrogate models (if enabled). Default is 'english'. Pass in custom stop-words as a list, e.g., ['great', 'good']. List of words to filter out before generation of text tokens, which are passed to MLI NLP LOCO and surrogate models (if enabled). Default is 'english'. Pass in custom stop-words as a list, e.g., ['great', 'good']. |
nlp_append_to_english_stop_words | False | A boolean | Append passed in list of custom stop words to default 'english' stop words. Append passed in list of custom stop words to default 'english' stop words. |
Create a list that comprises the chosen explainers.
explainers = [dia, orig_shap, dt, rf_loco]
Next, run an interpretation using the previously selected explainers. Set same_size
(surrogate model sample size) to 1000 and drf_max_depth
to 10.
interpretation = dai.mli.create(experiment=experiment,
dataset=experiment.datasets.get("train_dataset"),
explainers=explainers,
sample_size=1000,
drf_max_depth=10)
Complete 100.00% - Interpretation successfully finished.
Go to the GUI to view the explainers' graphs.
interpretation.gui()
View the list of all the executed explainers.
interpretation.explainers
/usr/local/Caskroom/miniconda/base/envs/dai-py/lib/python3.7/site-packages/ipykernel_launcher.py:1: UserWarning: 'Interpretation.explainers' is a beta API that is subject to future changes. """Entry point for launching an IPython kernel.
Key | Name | |
---|---|---|
0 | ec02474a-1cf5-11ee-ac4c-ac1f6b643c68 | Disparate Impact Analysis |
1 | ec025514-1cf5-11ee-ac4c-ac1f6b643c68 | Shapley Values for Original Features (Naive Method) |
2 | ec025c3a-1cf5-11ee-ac4c-ac1f6b643c68 | Interpretability Data Zip (Surrogate and Shapley Techniques) |
3 | ec026266-1cf5-11ee-ac4c-ac1f6b643c68 | Surrogate Random Forest Leave-one-covariate-out (LOCO) |
4 | ec02682e-1cf5-11ee-ac4c-ac1f6b643c68 | Surrogate Decision Tree |