Run specific explainer¶

Initialize a client with your server credentials and store it in the variable dai.

In [1]:

Copied!

import driverlessai

dai = driverlessai.Client(address='http://localhost:12345', username='py', password='py')
import driverlessai

dai = driverlessai.Client(address='http://localhost:12345', username='py', password='py')

Select and Configure Explainers¶

List all available explainers on the server.

In [2]:

Copied!

dai.recipes.explainers.list()
dai.recipes.explainers.list()

Out[2]:

    | Type            | Key   | Name
----+-----------------+-------+-------------------------------------------------------------------
  0 | ExplainerRecipe |       | Absolute Permutation Feature Importance
  1 | ExplainerRecipe |       | AutoDoc
  2 | ExplainerRecipe |       | Disparate Impact Analysis
  3 | ExplainerRecipe |       | Interpretability Data Zip (Surrogate and Shapley Techniques)
  4 | ExplainerRecipe |       | NLP Leave-one-covariate-out (LOCO)
  5 | ExplainerRecipe |       | NLP Partial Dependence Plot
  6 | ExplainerRecipe |       | NLP Tokenizer
  7 | ExplainerRecipe |       | NLP Vectorizer + Linear Model (VLM) Text Feature Importance
  8 | ExplainerRecipe |       | Original Feature Importance
  9 | ExplainerRecipe |       | Partial Dependence Plot
 10 | ExplainerRecipe |       | Relative Permutation Feature Importance
 11 | ExplainerRecipe |       | Sensitivity Analysis
 12 | ExplainerRecipe |       | Shapley Summary Plot for Original Features (Naive Shapley Method)
 13 | ExplainerRecipe |       | Shapley Values for Original Features (Kernel SHAP Method)
 14 | ExplainerRecipe |       | Shapley Values for Original Features (Naive Method)
 15 | ExplainerRecipe |       | Shapley Values for Transformed Features
 16 | ExplainerRecipe |       | Surrogate Decision Tree
 17 | ExplainerRecipe |       | Surrogate Random Forest Feature Importance
 18 | ExplainerRecipe |       | Surrogate Random Forest Leave-one-covariate-out (LOCO)
 19 | ExplainerRecipe |       | Surrogate Random Forest Partial Dependence Plot
 20 | ExplainerRecipe |       | Transformed Feature Importance
 21 | ExplainerRecipe |       | k-LIME/LIME-SUP
 22 | ExplainerRecipe |       | Time series explainer

Select specific explainers and save them for interpretation¶

Select the 'Disparate Impact Analysis' explainer located at index 2.

In [3]:

Copied!

dia = dai.recipes.explainers.list()[2]
print(dia)
dia = dai.recipes.explainers.list()[2]
print(dia)

Disparate Impact Analysis

Select the 'Shapley Values for Original Features' explainer located at index 14.

In [4]:

Copied!

orig_shap = dai.recipes.explainers.list()[14]
print(orig_shap)
orig_shap = dai.recipes.explainers.list()[14]
print(orig_shap)

Shapley Values for Original Features (Naive Method)

Select the 'Surrogate Random Forest Leave-one-covariate-out (LOCO)' explainer located at index 18.

In [5]:

Copied!

rf_loco = dai.recipes.explainers.list()[18]
print(rf_loco)
rf_loco = dai.recipes.explainers.list()[18]
print(rf_loco)

Surrogate Random Forest Leave-one-covariate-out (LOCO)

Select the 'Surrogate Decision Tree' explainer located at index 16.

In [6]:

Copied!

dt = dai.recipes.explainers.list()[16]
print(dt)
dt = dai.recipes.explainers.list()[16]
print(dt)

Surrogate Decision Tree

Explainer Expert Settings¶

You can change the settings of some explainers.

View all of the available settings that you can modify on the 'Shapley Values for Original Features' explainer:

In [7]:

Copied!

orig_shap.search_settings(show_valid_values=True, show_description=True)
orig_shap.search_settings(show_valid_values=True, show_description=True)

Out[7]:

Name	Default Value	Valid Values	Description
sample_size	100000	An integer	Sample size for Naive Shapley.
fast_approx_contribs	True	A boolean	Speed up predictions with fast contributions predictions approximation.

To modify the settings, you can use the with_settings method. View the docstring for an explanation of how this method works.

In [8]:

Copied!

orig_shap.with_settings?
orig_shap.with_settings?

Signature:
orig_shap.with_settings(
    validate_value: bool = True,
    *,
    sample_size: int = 100000,
    fast_approx_contribs: bool = True,
)
Docstring:
Changes the explainer settings from defaults. Settings reset to
defaults everytime this is called.

Args:
    validate_value: Enable value validation. Default is True.

.. note::
    To search possible explainer settings for your server version,
    use ``explainer.search_settings(search_term)``.
File:      /usr/local/Caskroom/miniconda/base/envs/dai-py/lib/python3.7/site-packages/driverlessai/_recipes.py
Type:      method

By default, the with_settings method performs validation on the value for the explainer settings. The following demonstrates what happens when an invalid value is passed to the setting sample_size.

In [9]:

Copied!

# Disable traceback printing in Jupyter to minimize clutter caused by exception tracebacks.
%xmode Minimal
# Disable traceback printing in Jupyter to minimize clutter caused by exception tracebacks.
%xmode Minimal

Exception reporting mode: Minimal

In [10]:

Copied!

orig_shap.with_settings(sample_size="1")
orig_shap.with_settings(sample_size="1")

ValueError: Invalid value type for 'sample_size'. Expected type integer, got string

In [11]:

Copied!

# Revert back to printing tracebacks on exceptions
%xmode Context
# Revert back to printing tracebacks on exceptions
%xmode Context

Exception reporting mode: Context

You can change the sample_size to 1000 on the 'Shapley Values for Original Features' explainer.

In [12]:

Copied!

orig_shap.with_settings(sample_size=1000)
orig_shap.with_settings(sample_size=1000)

Out[12]:

<class 'ExplainerRecipe'> Shapley Values for Original Features (Naive Method)

Next, view the non-default settings for the explainer.

In [13]:

Copied!

orig_shap.show_settings()
orig_shap.show_settings()

Out[13]:

Name	Value
sample_size	1000

On the 'Disparate Impact Analysis' explainer, modify the maximize_metric to "F2" and sample_size to 100.

In [14]:

Copied!

dia.search_settings(show_description=True, show_valid_values=True)
dia.search_settings(show_description=True, show_valid_values=True)

Out[14]:

Name	Default Value	Valid Values	Description
dia_cols		A list	List of features for which to compute DIA.
cut_off	0.0	A floating-point	Cut off.
maximize_metric	F1	'F1', 'F05', 'F2' or 'MCC'	Maximize metric.
use_holdout_preds	True	A boolean	Use internal holdout predictions.
sample_size	100000	An integer	Sample size for Disparate Impact Analysis.
max_cardinality	10	An integer	Max cardinality for categorical variables.
min_cardinality	2	An integer	Minimum cardinality for categorical variables.
num_card	25	An integer	Max cardinality for numeric variables to be considered categorical.
fast_approx	True	A boolean	Speed up predictions with a fast approximation.

In [16]:

Copied!

dia.with_settings(maximize_metric="F2", sample_size=1000)
dia.with_settings(maximize_metric="F2", sample_size=1000)

Out[16]:

<class 'ExplainerRecipe'> Disparate Impact Analysis

In [17]:

Copied!

dia.show_settings()
dia.show_settings()

Out[17]:

Name	Value
maximize_metric	F2
sample_size	1000

On the 'Surrogate Decision Tree' explainer, modify the dt_tree_depth to 5 and categorical_encoding to "AUTO".

In [18]:

Copied!

dt.search_settings(show_valid_values=True)
dt.search_settings(show_valid_values=True)

Out[18]:

Name	Default Value	Valid Values
dt_tree_depth	3	An integer
nfolds	3	An integer
qbin_cols		A list
qbin_count	0	An integer
categorical_encoding	One Hot Encoding	'AUTO', 'One Hot Encoding', 'Enum Limited', 'Sort by Response' or 'Label Encoder'

In [19]:

Copied!

dt.with_settings(dt_tree_depth=5, categorical_encoding="AUTO")
dt.with_settings(dt_tree_depth=5, categorical_encoding="AUTO")

Out[19]:

<class 'ExplainerRecipe'> Surrogate Decision Tree

In [20]:

Copied!

dt.show_settings()
dt.show_settings()

Out[20]:

Name	Value
dt_tree_depth	5
categorical_encoding	AUTO

Create interpretation¶

Get the most recent experiment on the server.

In [21]:

Copied!

experiment = dai.experiments.list()[0]
experiment = dai.experiments.list()[0]

View all Interpretation Expert Settings that are available for configuration.

In [22]:

Copied!

dai.mli.search_expert_settings(show_description=True, show_valid_values=True)
dai.mli.search_expert_settings(show_description=True, show_valid_values=True)

Out[22]:

Name	Default Value	Valid Values	Description
lime_method	k-LIME	'k-LIME' or 'LIME-SUP'	LIME method. Choose LIME method to be used for creation of surrogate models.
use_raw_features	True	A boolean	Use original features for surrogate models. Choose whether surrogate models should be built for original or transformed features.
ts_use_raw_features	False	A boolean	Use original features for time series based surrogate models. Choose whether time series based surrogate models should be built for original features.
sample	True	A boolean	Sample all explainers. Choose whether to run all explainers on the sampled dataset.
vars_to_pdp	10	An integer values greater than or equal to -1.	Number of features for Surrogate Partial Dependence Plot. Set to -1 to use all features. Set maximum number of features for which to build Surrogate Partial Dependence Plot. Use -1 to calculate Surrogate Partial Dependence Plot for all features.
nfolds	3	An integer value is within the range of 0 to 10.	Cross-validation folds for surrogate models. Set the number of cross-validation folds for surrogate models.
qbin_count	0	An integer value is within the range of 0 to 10.	Number of columns to bin for surrogate models. Set the number of columns to bin in case of quantile binning.
enable_mojo_scorer	True	A boolean	Allow use of MOJO scoring pipeline. Use this option to disable MOJO scoring pipeline. Scoring pipeline is chosen automatically (from MOJO and Python pipelines) by default. In case of certain models MOJO vs. Python choice can impact pipeline performance and robustness.
sample_size	100000	An integer	Sample size for surrogate models. The sample size, number of rows, used for MLI surrogate models.
num_quantiles	10	An integer	Number of bins for quantile binning. Number of bins for quantile binning.
drf_num_trees	100	An integer	Number of trees for Random Forest surrogate model. Number of trees for Random Forest surrogate model.
fast_approx	True	A boolean	Speed up predictions with a fast approximation. Speed up predictions with a fast approximation (can reduce the number of trees or cross-validation folds).
drf_max_depth	20	An integer	Max depth for Random Forest surrogate model. Max depth for Random Forest surrogate model.
klime_lambda	[1e-06, 1e-08]	A list	Regularization strength for k-LIME GLM's. Regularization strength for k-LIME GLM's.
klime_alpha	0.0	A floating-point	Regularization distribution between L1 and L2 for k-LIME GLM's. Regularization distribution between L1 and L2 for k-LIME GLM's.
max_numeric_enum_cardinality	25	An integer	Max cardinality for numeric variables in surrogate models to be considered categorical. Max cardinality for numeric variables in surrogate models to be considered categorical.
max_number_cluster_vars	6	An integer	Maximum number of features allowed for k-LIME k-means clustering. Maximum number of features allowed for k-LIME k-means clustering.
use_all_columns_klime_kmeans	False	A boolean	Use all columns for k-LIME k-means clustering (this will override `mli_max_number_cluster_vars` if set to `True`). Use all columns for k-LIME k-means clustering (this will override `mli_max_number_cluster_vars` if set to `True`).
custom		A string	Add to config.toml via TOML string. Add TOML string to Driverless AI server config.toml configuration file.
nlp_top_n	20	An integer	Number of tokens used for MLI NLP explanations. -1 means all. Number of tokens used for MLI NLP explanations. -1 means all.
nlp_sample_limit	10000	An integer	Sample size for MLI NLP explainers. Maximum number of records used by MLI NLP explainers.
nlp_min_df	3	An integer	Minimum number of documents in which token has to appear. Integer mean absolute count, float means percentage. Minimum number of documents in which token has to appear. Integer mean absolute count, float means percentage.
nlp_max_df	0.9	A floating-point	Maximum number of documents in which token has to appear. Integer mean absolute count, float means percentage. Maximum number of documents in which token has to appear. Integer mean absolute count, float means percentage.
nlp_min_ngram	1	An integer	The minimum value in the ngram range. The tokenizer will generate all possible tokens in the (mli_nlp_min_ngram, mli_nlp_max_ngram) range. The minimum value in the ngram range. The tokenizer will generate all possible tokens in the (mli_nlp_min_ngram, mli_nlp_max_ngram) range.
nlp_max_ngram	1	An integer	The maximum value in the ngram range. The tokenizer will generate all possible tokens in the (mli_nlp_min_ngram, mli_nlp_max_ngram) range. The maximum value in the ngram range. The tokenizer will generate all possible tokens in the (mli_nlp_min_ngram, mli_nlp_max_ngram) range.
nlp_min_token_mode	top	A string	Mode used to choose N tokens for MLI NLP. "top" chooses N top tokens. "bottom" chooses N bottom tokens. "top-bottom" chooses math.floor(N/2) top and math.ceil(N/2) bottom tokens. "linspace" chooses N evenly spaced out tokens. Mode used to choose N tokens for MLI NLP. "top" chooses N top tokens. "bottom" chooses N bottom tokens. "top-bottom" chooses math.floor(N/2) top and math.ceil(N/2) bottom tokens. "linspace" chooses N evenly spaced out tokens.
nlp_tokenizer_max_features	-1	An integer	The number of top tokens to be used as features when building token based feature importance. The number of top tokens to be used as features when building token based feature importance.
nlp_loco_max_features	-1	An integer	The number of top tokens to be used as features when computing text LOCO. The number of top tokens to be used as features when computing text LOCO.
nlp_surrogate_tokenizer	Linear Model + TF-IDF	'Linear Model + TF-IDF' or 'TF-IDF'	Tokenizer for surrogate models. Only applies to NLP models. The tokenizer method to use when tokenizing a dataset for surrogate models. Can either choose 'TF-IDF' or 'Linear Model + TF-IDF', which first runs TF-IDF to get tokens and then fits a linear model between the tokens and the target to get importances of tokens, which are based on coefficients of the linear model. Default is 'Linear Model + TF-IDF'. Only applies to NLP models.
nlp_surrogate_tokens	100	An integer	The number of top tokens to be used as features when building surrogate models. Only applies to NLP models. The number of top tokens to be used as features when building surrogate models. Only applies to NLP models.
nlp_use_stop_words	True	A boolean	Ignore stop words for MLI NLP. Ignore stop words for MLI NLP.
nlp_stop_words	english	A string	List of words to filter out before generation of text tokens, which are passed to MLI NLP LOCO and surrogate models (if enabled). Default is 'english'. Pass in custom stop-words as a list, e.g., ['great', 'good']. List of words to filter out before generation of text tokens, which are passed to MLI NLP LOCO and surrogate models (if enabled). Default is 'english'. Pass in custom stop-words as a list, e.g., ['great', 'good'].
nlp_append_to_english_stop_words	False	A boolean	Append passed in list of custom stop words to default 'english' stop words. Append passed in list of custom stop words to default 'english' stop words.

Create a list that comprises the chosen explainers.

In [23]:

Copied!

explainers = [dia, orig_shap, dt, rf_loco]
explainers = [dia, orig_shap, dt, rf_loco]

Next, run an interpretation using the previously selected explainers. Set same_size (surrogate model sample size) to 1000 and drf_max_depth to 10.

In [24]:

Copied!





interpretation = dai.mli.create(experiment=experiment,
                                dataset=experiment.datasets.get("train_dataset"),
                                explainers=explainers,
                                sample_size=1000,
                                drf_max_depth=10)
interpretation = dai.mli.create(experiment=experiment,
                                dataset=experiment.datasets.get("train_dataset"),
                                explainers=explainers,
                                sample_size=1000,
                                drf_max_depth=10)

Complete 100.00% - Interpretation successfully finished.

Go to the GUI to view the explainers' graphs.

In [25]:

Copied!

interpretation.gui()
interpretation.gui()

Out[25]:

http://localhost:12345/#/interpret_next?interpret_key=ebbe5008-1cf5-11ee-ac4c-ac1f6b643c68

View the list of all the executed explainers.

In [26]:

Copied!

interpretation.explainers
interpretation.explainers

/usr/local/Caskroom/miniconda/base/envs/dai-py/lib/python3.7/site-packages/ipykernel_launcher.py:1: UserWarning: 'Interpretation.explainers' is a beta API that is subject to future changes.
  """Entry point for launching an IPython kernel.

Out[26]:

	Key	Name
0	ec02474a-1cf5-11ee-ac4c-ac1f6b643c68	Disparate Impact Analysis
1	ec025514-1cf5-11ee-ac4c-ac1f6b643c68	Shapley Values for Original Features (Naive Method)
2	ec025c3a-1cf5-11ee-ac4c-ac1f6b643c68	Interpretability Data Zip (Surrogate and Shapley Techniques)
3	ec026266-1cf5-11ee-ac4c-ac1f6b643c68	Surrogate Random Forest Leave-one-covariate-out (LOCO)
4	ec02682e-1cf5-11ee-ac4c-ac1f6b643c68	Surrogate Decision Tree

In [ ]: