Variable importance in Driverless AI

Global Feature Importance

Model Specific Feature Importance: After completion of an experiment Driverless AI, reports the variable importance that is model or algorithm specific. For example for Tree based models, this importance is gain based. i.e It computes the average reduction in impurity across all trees in the forest due to each feature. Features that tend to split nodes closer to the root of a tree have a larger importance value. For say an n fold model the variable importance is averaged across the folds, normalized and reported. For an ensemble model, the importance is multiplied by the respective model weights and normalized.
Permutation Feature Importance: Permutation-based feature importance is a model-agnostic approach. After evaluating the performance or scoring a model, if you permute (shuffle) the values of a feature of interest and re-evaluate model performance, the observed mean difference in performance indicates feature’s absolute permutation importance. If the feature has little predictive power, shuffling its values should have less impact on the model’s performance. If a feature is highly predictive, however, shuffling its values should decrease the model’s performance. ref.

Driverless AI applies permutation based feature importance for upfront feature selection before genetic algorithm when the feature space is large.

Local Feature Importance

LIME: Local interpretable model-agnostic explanations (LIME) is a model agnostic technique aiming to explain which features are most important in specific areas of the feature space. The main idea of LIME is to compute a local surrogate model in the area of interest. This surrogate model is an easily interpretable model such as a linear model or a decision tree trained to mimic the behavior of the more complex model of interest. For a specific prediction you want to explain, LIME slightly changes the values to create new data points that are similar. By feeding these perturbed data points to the complex model a relation between the the perturbed features and the model prediction emerges which is then captured by the surrogate model.

Driverless MLI builds LIME models as part of model interpretation toolkit.
Shapley: Shapley values can be used for local feature importance. They can be used to explain which feature(s) contribute most to a specific prediction, say fraud or not fraud. Shapley values are not designed to answer the “what if” questions that LIME’s local surrogate models are designed for.

Shapely has its origin in game theory where the problem at hand is to determine a fair payoff for all players in the team based on their individual capabilities or performance. Shapley value is defined as an average expected marginal contribution of one player after all possible combinations have been considered. A marginal contribution is defined as a value of the group with the player as a member minus the value of the group without the player minus the value created by the player working alone.

As considering all possible subsets (or combinations) of features is computationally prohibitive in most realistic models with many features, Shapley value approximations are computed based on sampling. Driverless AI uses python SHAP package for Shapley calculations. These can be downloaded as experiment artifact or can be obtained per row during production, using mojo or python scoring pipelines.

Driverless MLI builds Shapley as part of model interpretation toolkit.