Mli_nlp configuration¶
mli_nlp_top_n
¶
Number of tokens used for MLI NLP explanations. -1 means all. (Number)
Default value 20
Number of tokens used for MLI NLP explanations. -1 means all.
mli_nlp_sample_limit
¶
Sample size for MLI NLP explainers. (Number)
Default value 10000
Maximum number of records used by MLI NLP explainers.
mli_nlp_min_df
¶
Minimum number of documents in which token has to appear. Integer mean absolute count, float means percentage. (Number)
Default value 3
Minimum number of documents in which token has to appear. Integer mean absolute count, float means percentage.
mli_nlp_max_df
¶
Maximum number of documents in which token has to appear. Integer mean absolute count, float means percentage. (Float)
Default value 0.9
Maximum number of documents in which token has to appear. Integer mean absolute count, float means percentage.
mli_nlp_min_ngram
¶
The minimum value in the ngram range. The tokenizer will generate all possible tokens in the (mli_nlp_min_ngram, mli_nlp_max_ngram) range. (Number)
Default value 1
The minimum value in the ngram range. The tokenizer will generate all possible tokens in the (mli_nlp_min_ngram, mli_nlp_max_ngram) range.
mli_nlp_max_ngram
¶
The maximum value in the ngram range. The tokenizer will generate all possible tokens in the (mli_nlp_min_ngram, mli_nlp_max_ngram) range. (Number)
Default value 1
The maximum value in the ngram range. The tokenizer will generate all possible tokens in the (mli_nlp_min_ngram, mli_nlp_max_ngram) range.
mli_nlp_min_token_mode
¶
**Mode used to choose N tokens for MLI NLP.
《top》 chooses N top tokens. 《bottom》 chooses N bottom tokens. 《top-bottom》 chooses math.floor(N/2) top and math.ceil(N/2) bottom tokens. 《linspace》 chooses N evenly spaced out tokens.** (String)
Default value
'top'
Mode used to choose N tokens for MLI NLP. 《top》 chooses N top tokens. 《bottom》 chooses N bottom tokens. 《top-bottom》 chooses math.floor(N/2) top and math.ceil(N/2) bottom tokens. 《linspace》 chooses N evenly spaced out tokens.
mli_nlp_tokenizer_max_features
¶
The number of top tokens to be used as features when building token based feature importance. (Number)
Default value -1
The number of top tokens to be used as features when building token based feature importance.
mli_nlp_loco_max_features
¶
The number of top tokens to be used as features when computing text LOCO. (Number)
Default value -1
The number of top tokens to be used as features when computing text LOCO.
mli_nlp_surrogate_tokenizer
¶
Tokenizer for surrogate models. Only applies to NLP models. (String)
Default value 'Linear Model + TF-IDF'
The tokenizer method to use when tokenizing a dataset for surrogate models. Can either choose 〈TF-IDF〉 or 〈Linear Model + TF-IDF〉, which first runs TF-IDF to get tokens and then fits a linear model between the tokens and the target to get importances of tokens, which are based on coefficients of the linear model. Default is 〈Linear Model + TF-IDF〉. Only applies to NLP models.
mli_nlp_surrogate_tokens
¶
The number of top tokens to be used as features when building surrogate models. Only applies to NLP models. (Number)
Default value 100
The number of top tokens to be used as features when building surrogate models. Only applies to NLP models.
mli_nlp_use_stop_words
¶
Ignore stop words for MLI NLP. (Boolean)
Default value True
Ignore stop words for MLI NLP.
mli_nlp_stop_words
¶
List of words to filter out before generation of text tokens, which are passed to MLI NLP LOCO and surrogate models (if enabled). Default is 〈english〉. Pass in custom stop-words as a list, e.g., [〈great〉, 〈good〉]. (String)
Default value 'english'
List of words to filter out before generation of text tokens, which are passed to MLI NLP LOCO and surrogate models (if enabled). Default is 〈english〉. Pass in custom stop-words as a list, e.g., [〈great〉, 〈good〉].
mli_nlp_append_to_english_stop_words
¶
Append passed in list of custom stop words to default 〈english〉 stop words. (Boolean)
Default value False
Append passed in list of custom stop words to default 〈english〉 stop words.