Mli_nlp configuration
mli_nlp_top_n
Number of tokens used for MLI NLP explanations. -1 means all. (Number)
Default value 20
Number of tokens used for MLI NLP explanations. -1 means all.
mli_nlp_sample_limit
Sample size for MLI NLP explainers. (Number)
Default value 10000
Maximum number of records used by MLI NLP explainers.
mli_nlp_min_df
Minimum number of documents in which token has to appear. Integer mean absolute count, float means percentage. (Number)
Default value 3
Minimum number of documents in which token has to appear. Integer mean absolute count, float means percentage.
mli_nlp_max_df
Maximum number of documents in which token has to appear. Integer mean absolute count, float means percentage. (Float)
Default value 0.9
Maximum number of documents in which token has to appear. Integer mean absolute count, float means percentage.
mli_nlp_min_ngram
The minimum value in the ngram range. The tokenizer will generate all possible tokens in the (mli_nlp_min_ngram, mli_nlp_max_ngram) range. (Number)
Default value 1
The minimum value in the ngram range. The tokenizer will generate all possible tokens in the (mli_nlp_min_ngram, mli_nlp_max_ngram) range.
mli_nlp_max_ngram
The maximum value in the ngram range. The tokenizer will generate all possible tokens in the (mli_nlp_min_ngram, mli_nlp_max_ngram) range. (Number)
Default value 1
The maximum value in the ngram range. The tokenizer will generate all possible tokens in the (mli_nlp_min_ngram, mli_nlp_max_ngram) range.
mli_nlp_min_token_mode
**Mode used to choose N tokens for MLI NLP.
“top” chooses N top tokens. “bottom” chooses N bottom tokens. “top-bottom” chooses math.floor(N/2) top and math.ceil(N/2) bottom tokens. “linspace” chooses N evenly spaced out tokens.** (String)
Default value
'top'
Mode used to choose N tokens for MLI NLP. “top” chooses N top tokens. “bottom” chooses N bottom tokens. “top-bottom” chooses math.floor(N/2) top and math.ceil(N/2) bottom tokens. “linspace” chooses N evenly spaced out tokens.
mli_nlp_tokenizer_max_features
The number of top tokens to be used as features when building token based feature importance. (Number)
Default value -1
The number of top tokens to be used as features when building token based feature importance.
mli_nlp_loco_max_features
The number of top tokens to be used as features when computing text LOCO. (Number)
Default value -1
The number of top tokens to be used as features when computing text LOCO.
mli_nlp_surrogate_tokenizer
Tokenizer for surrogate models. Only applies to NLP models. (String)
Default value 'Linear Model + TF-IDF'
The tokenizer method to use when tokenizing a dataset for surrogate models. Can either choose ‘TF-IDF’ or ‘Linear Model + TF-IDF’, which first runs TF-IDF to get tokens and then fits a linear model between the tokens and the target to get importances of tokens, which are based on coefficients of the linear model. Default is ‘Linear Model + TF-IDF’. Only applies to NLP models.
mli_nlp_surrogate_tokens
The number of top tokens to be used as features when building surrogate models. Only applies to NLP models. (Number)
Default value 100
The number of top tokens to be used as features when building surrogate models. Only applies to NLP models.
mli_nlp_use_stop_words
Ignore stop words for MLI NLP. (Boolean)
Default value True
Ignore stop words for MLI NLP.
mli_nlp_stop_words
List of words to filter out before generation of text tokens, which are passed to MLI NLP LOCO and surrogate models (if enabled). Default is ‘english’. Pass in custom stop-words as a list, e.g., [‘great’, ‘good’]. (String)
Default value 'english'
List of words to filter out before generation of text tokens, which are passed to MLI NLP LOCO and surrogate models (if enabled). Default is ‘english’. Pass in custom stop-words as a list, e.g., [‘great’, ‘good’].
mli_nlp_append_to_english_stop_words
Append passed in list of custom stop words to default ‘english’ stop words. (Boolean)
Default value False
Append passed in list of custom stop words to default ‘english’ stop words.