Recipes configuration¶
included_transformers
¶
Include specific transformers (List) (Expert Setting)
Default value []
Transformer display names to indicate which transformers to use in experiment. More information for these transformers can be viewed here: http://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/transformations.html This section allows including/excluding these transformations and may be useful when simpler (more interpretable) models are sought at the expense of accuracy. the interpretability setting) for multi-class: ‘[‘NumCatTETransformer’, ‘TextLinModelTransformer’, ‘FrequentTransformer’, ‘CVTargetEncodeTransformer’, ‘ClusterDistTransformer’, ‘WeightOfEvidenceTransformer’, ‘TruncSVDNumTransformer’, ‘CVCatNumEncodeTransformer’, ‘DatesTransformer’, ‘TextTransformer’, ‘OriginalTransformer’, ‘NumToCatWoETransformer’, ‘NumToCatTETransformer’, ‘ClusterTETransformer’, ‘InteractionsTransformer’]’
for regression/binary: ‘[‘TextTransformer’, ‘ClusterDistTransformer’, ‘OriginalTransformer’, ‘TextLinModelTransformer’, ‘NumToCatTETransformer’, ‘DatesTransformer’, ‘WeightOfEvidenceTransformer’, ‘InteractionsTransformer’, ‘FrequentTransformer’, ‘CVTargetEncodeTransformer’, ‘NumCatTETransformer’, ‘NumToCatWoETransformer’, ‘TruncSVDNumTransformer’, ‘ClusterTETransformer’, ‘CVCatNumEncodeTransformer’]’
This list appears in the experiment logs (search for ‘Transformers used’)
included_models
¶
Include specific models (List) (Expert Setting)
Default value []
“Include specific models” lets you choose a set of models that will be considered during experiment training. The individual model settings and its AUTO / ON / OFF mean following: AUTO lets the internal decision mechanisms determine whether the model should be used during training; ON will try to force the use of the model; OFF turns the model off during training (it is equivalent of deselecting the model in the “Include specific models” picker).
included_scorers
¶
Include specific scorers (List) (Expert Setting)
Default value []
included_pretransformers
¶
Include specific preprocessing transformers (List) (Expert Setting)
Default value []
Select transformers to be used for preprocessing before other transformers operate. Pre-processing transformers can potentially take any original features and output arbitrary features, which will then be used by the normal layer of transformers whose selection is controlled by toml included_transformers or via the GUI “Include specific transformers”. Notes: 1) preprocessing transformers (and all other layers of transformers) are part of the python and (if applicable) mojo scoring packages. 2) any BYOR transformer recipe or native DAI transformer can be used as a preprocessing transformer. So, e.g., a preprocessing transformer can do interactions, string concatenations, date extractions as a preprocessing step,
and next layer of Date and DateTime transformers will use that as input data.
Caveats: 1) one cannot currently do a time-series experiment on a time_column that hasn’t yet been made (setup of experiment only knows about original data, not transformed)
However, one can use a run-time data recipe to (e.g.) convert a float date-time into string date-time, and this will be used by DAIs Date and DateTime transformers as well as auto-detection of time series.
in order to do a time series experiment with the GUI/client auto-selecting groups, periods, etc. the dataset must have time column and groups prepared ahead of experiment by user or via a one-time data recipe.
num_pipeline_layers
¶
Number of pipeline layers (Number) (Expert Setting)
Default value 1
- Number of full pipeline layers
(not including preprocessing layer when included_pretransformers is not empty).
included_datas
¶
Include specific data recipes during experiment (List) (Expert Setting)
Default value []
There are 2 data recipes: 1) that adds new dataset or modifies dataset outside experiment by file/url (pre-experiment data recipe) 2) that modifies dataset during experiment and python scoring (run-time data recipe) This list applies to the 2nd case. One can use the same data recipe code for either case, but note: A) the 1st case can make any new data, but is not part of scoring package. B) the 2nd case modifies data during the experiment, so needs some original dataset.
- The recipe can still create all new features, as long as it has same name for:
target, weight_column, fold_column, time_column, time group columns.
included_individuals
¶
Include specific individuals (List) (Expert Setting)
Default value []
Custom individuals to use in experiment. DAI contains most information about model type, model hyperparameters, data science types for input features, transformers used, and transformer parameters an Individual Recipe (an object that is evolved by mutation within the context of DAI’s genetic algorithm).
Every completed experiment auto-generates python code for the experiment that corresponds to the individual(s) used to build the final model. This auto-generated python code can be edited offline and uploaded as a recipe, or it can be edited within the custom recipe management editor and saved. This allowed one a code-first access to a significant portion of DAI’s internal transformer and model generation.
Choices are: * Empty means all individuals are freshly generated and treated by DAI’s AutoML as a container of model and transformer choices. * Recipe display names of custom individuals, usually chosen via the UI. If the number of included custom individuals is less than DAI would need, then the remaining individuals are freshly generated. The expert experiment-level option fixed_num_individuals can be used to enforce how many individuals to use in evolution stage. The expert experiment-level option fixed_ensemble_level can be used to enforce how many individuals (each with one base model) will be used in the final model.
These individuals act in similar way as the feature brain acts for restart and retrain/refit, and one can retrain/refit custom individuals (i.e. skip the tuning and evolution stages) to use them in building a final model.
See toml make_python_code for more details.
make_python_code
¶
Generate python code for individual (String) (Expert Setting)
Default value 'auto'
Whether to generate python code for the best individuals for the experiment. This python code contains a CustomIndividual class that is a recipe that can be edited and customized. The CustomIndividual class itself can also be customized for expert use.
By default, ‘auto’ means on.
At the end of an experiment, the summary zip contains auto-generated python code for the individuals used in the experiment, including the last best population (best_population_indivXX.py where XX iterates the population), last best individual (best_individual.py), final base models (final_indivYY.py where YY iterates the final base models). The summary zip also contains an example_indiv.py file that generates other transformers that may be useful that did not happen to be used in the experiment. In addition, the GUI and python client allow one to generate custom individuals from an aborted or finished experiment. For finished experiments, this will provide a zip file containing the final_indivYY.py files, and for aborted experiments this will contain the best population and best individual files.
See included_individuals for more details.
make_json_code
¶
Generate json code for individual (String) (Expert Setting)
Default value 'auto'
Whether to generate json code for the best individuals for the experiment. This python code contains the essential attributes from the internal DAI individual class. Reading the json code as a recipe is not supported. By default, ‘auto’ means off.
python_code_ngenes_max
¶
Max. Num. genes for example auto-generated individual (Number) (Expert Setting)
Default value 100
Maximum number of genes to make for example auto-generated custom individual, called example_indiv.py in the summary zip file.
python_code_ngenes_min
¶
Min. Num. genes for example auto-generated individual (Number) (Expert Setting)
Default value 100
Minimum number of genes to make for example auto-generated custom individual, called example_indiv.py in the summary zip file.
threshold_scorer
¶
For binary classification only: Scorer to optimize threshold to be used in confusion-matrix based scorers that are trivial to optimize and for label creation in MOJO/Python scorers. (String) (Expert Setting)
Default value 'AUTO'
Select the scorer to optimize the binary probability threshold that is being used in related Confusion Matrix based scorers that are trivial to optimize otherwise: Precision, Recall, FalsePositiveRate, FalseDiscoveryRate, FalseOmissionRate, TrueNegativeRate, FalseNegativeRate, NegativePredictiveValue. Use F1 if the target class matters more, and MCC if all classes are equally important. AUTO will try to sync the threshold scorer with the scorer used for the experiment, otherwise falls back to F1. The optimized threshold is also used for creating labels in addition to probabilities in MOJO/Python scorers.
prob_add_genes
¶
Probability to add transformers (Float) (Expert Setting)
Default value 0.5
Unnormalized probability to add genes or instances of transformers with specific attributes. If no genes can be added, other mutations (mutating models hyper parmaters, pruning genes, pruning features, etc.) are attempted.
prob_addbest_genes
¶
Probability to add best shared transformers (Float) (Expert Setting)
Default value 0.5
Unnormalized probability, conditioned on prob_add_genes, to add genes or instances of transformers with specific attributes that have shown to be beneficial to other individuals within the population.
prob_prune_genes
¶
Probability to prune transformers (Float) (Expert Setting)
Default value 0.5
Unnormalized probability to prune genes or instances of transformers with specific attributes. If a variety of transformers with many attributes exists, default value is reasonable. However, if one has fixed set of transformers that should not change or no new transformer attributes can be added, then setting this to 0.0 is reasonable to avoid undesired loss of transformations.
prob_perturb_xgb
¶
Probability to mutate model parameters (Float) (Expert Setting)
Default value 0.25
Unnormalized probability change model hyper parameters.
prob_prune_by_features
¶
Probability to prune weak features (Float) (Expert Setting)
Default value 0.25
Unnormalized probability to prune features that have low variable importance, as opposed to pruning entire instances of genes/transformers when prob_prune_genes used. If prob_prune_genes=0.0 and prob_prune_by_features==0.0 and prob_prune_by_top_features==0.0, then genes/transformers and transformed features are only pruned if they are: 1) inconsistent with the genome 2) inconsistent with the column data types 3) had no signal (for interactions and cv_in_cv for target encoding) 4) transformation failed E.g. these are toml settings are then ignored: 1) ngenes_max 2) limit_features_by_interpretability 3) varimp_threshold_at_interpretability_10 4) features_allowed_by_interpretability 5) remove_scored_0gain_genes_in_postprocessing_above_interpretability 6) nfeatures_max_threshold 7) features_cost_per_interp So this acts similar to no_drop_features, except no_drop_features also applies to shift and leak detection, constant columns are not dropped, ID columns are not dropped.
prob_prune_by_top_features
¶
Probability to prune strong features (Float) (Expert Setting)
Default value 0.25
Unnormalized probability to prune features that have high variable importance, in case they have high gain but negaive perfomrance on validation and would otherwise maintain poor validation scores. Similar to prob_prune_by_features but for high gain features.
max_num_prune_by_top_features
¶
Number of high gain features to prune each mutation (Number) (Expert Setting)
Default value 1
Maximum number of high gain features to prune for each mutation call, to control behavior of prob_prune_by_top_features.
prob_prune_pretransformer_genes
¶
Probability to prune pretransformers (Float) (Expert Setting)
Default value 0.5
Like prob_prune_genes but only for pretransformers, i.e. those transformers in layers except last layer that connects to model.
prob_prune_pretransformer_by_features
¶
Probability to prune weak pretransformer features (Float) (Expert Setting)
Default value 0.25
Like prob_prune_by_features but only for pretransformers, i.e. those transformers in layers except last layer that connects to model.
prob_prune_pretransformer_by_top_features
¶
Probability to prune strong pretransformer features (Float) (Expert Setting)
Default value 0.25
Like prob_prune_by_top_features but only for pretransformers, i.e. those transformers in layers except last layer that connects to model.
override_individual_from_toml_list
¶
config.toml items stored in individual to overwrite (List) (Expert Setting)
Default value ['prob_perturb_xgb', 'prob_add_genes', 'prob_addbest_genes', 'prob_prune_genes', 'prob_prune_by_features', 'prob_prune_by_top_features', 'prob_prune_pretransformer_genes', 'prob_prune_pretransformer_by_features', 'prob_prune_pretransformer_by_top_features']
When doing restart, retrain, refit, reset these individual parameters to new toml values.
skip_transformer_failures
¶
Whether to skip failures of transformers (Boolean) (Expert Setting)
Default value True
Skipping just avoids the failed transformer. Sometimes python multiprocessing swallows exceptions, so skipping and logging exceptions is also more reliable way to handle them. Recipe can raise h2oaicore.systemutils.IgnoreError to ignore error and avoid logging error. Features that fail are pruned from the individual. If that leaves no features in the individual, then backend tuning, feature/model tuning, final model building, etc. will still fail since DAI should not continue if all features are from a failed state.
skip_model_failures
¶
Whether to skip failures of models (Boolean) (Expert Setting)
Default value True
Skipping just avoids the failed model. Failures are logged depending upon detailed_skip_failure_messages_level.” Recipe can raise h2oaicore.systemutils.IgnoreError to ignore error and avoid logging error.
skip_scorer_failures
¶
Whether to skip failures of scorers (Boolean) (Expert Setting)
Default value True
Skipping just avoids the failed scorer if among many scorers. Failures are logged depending upon detailed_skip_failure_messages_level.” Recipe can raise h2oaicore.systemutils.IgnoreError to ignore error and avoid logging error. Default is True to avoid failing in, e.g., final model building due to a single scorer.
skip_data_recipe_failures
¶
Whether to skip runtime data recipe failures (Boolean) (Expert Setting)
Default value False
Skipping avoids the failed recipe. Failures are logged depending upon detailed_skip_failure_messages_level.” Default is False because runtime data recipes are one-time at start of experiment and expected to work by default.
detailed_skip_failure_messages_level
¶
Level to log (0=simple message 1=code line plus message 2=detailed stack traces) for skipped failures. (Number) (Expert Setting)
Default value 1
- How much verbosity to log failure messages for failed and then skipped transformers or models.
Full failures always go to disk as *.stack files, which upon completion of experiment goes into details folder within experiment log zip file.
notify_failures
¶
Whether to notify about failures of transformers or models or other recipe failures (Boolean) (Expert Setting)
Default value True
Whether to not just log errors of recipes (models and transformers) but also show high-level notification in GUI.
enable_custom_recipes
¶
enable_custom_recipes (Boolean)
Default value True
Enable custom recipes.
enable_custom_recipes_upload
¶
enable_custom_recipes_upload (Boolean)
Default value True
Enable uploading of custom recipes from local file system.
must_have_custom_transformers
¶
must_have_custom_transformers (Boolean)
Default value False
must_have_custom_transformers_2
¶
must_have_custom_transformers_2 (Boolean)
Default value False
must_have_custom_transformers_3
¶
must_have_custom_transformers_3 (Boolean)
Default value False
must_have_custom_models
¶
must_have_custom_models (Boolean)
Default value False
must_have_custom_scorers
¶
must_have_custom_scorers (Boolean)
Default value False
extra_migration_custom_recipes_missing_modules
¶
Whether to enable extra attempt to migrate custom modules during preview to show preview. Can lead to slow preview loading. (Boolean)
Default value False
force_include_custom_recipes_by_default
¶
force_include_custom_recipes_by_default (Boolean)
Default value False
h2o_recipes_port
¶
h2o_recipes_port (Number)
Default value 50361
Port of H2O instance for use by transformers, models, or scorers. No other instances must be on that port or on next port.
h2o_recipes_name
¶
h2o_recipes_name (String)
Default value 'None'
Name of H2O instance for use by transformers, models, or scorers.
recipe_dict
¶
recipe_dict (Dict) (Expert Setting)
Default value {}
- Dictionary to control recipes for each experiment and particular custom recipes.
E.g. if inserting into the GUI as any toml string, can use: “”recipe_dict=”{‘key1’: 2, ‘key2’: ‘value2’}””” E.g. if putting into config.toml as a dict, can use: recipe_dict=”{‘key1’: 2, ‘key2’: ‘value2’}”
mutation_dict
¶
mutation_dict (Dict) (Expert Setting)
Default value {}
- Dictionary to control some mutation parameters.
E.g. if inserting into the GUI as any toml string, can use: “”mutation_dict=”{‘key1’: 2, ‘key2’: ‘value2’}””” E.g. if putting into config.toml as a dict, can use: mutation_dict=”{‘key1’: 2, ‘key2’: ‘value2’}”
raise_on_invalid_included_list
¶
Whether to validate recipe names (Boolean) (Expert Setting)
Default value False
Whether to validate recipe names provided in included lists, like included_models, or (if False) whether to just log warning to server logs and ignore any invalid names of recipes.
ignore_package_version
¶
ignore_package_version (List)
Default value []
List of package versions to ignore. Useful when small version change but likely to function still with old package version.
clobber_package_version
¶
clobber_package_version (List)
Default value ['catboost', 'h2o_featurestore']
List of package versions to remove if encounter conflict. Useful when want new version of package, and old recipes likely to function still.
swap_package_version
¶
swap_package_version (Dict)
Default value {'catboost==0.26.1': 'catboost==1.0.5', 'catboost==0.25.1': 'catboost==1.0.5', 'catboost==0.24.1': 'catboost==1.0.5', 'catboost==1.0.4': 'catboost==1.0.5', 'catboost==1.0.6': 'catboost==1.0.5', 'catboost': 'catboost==1.0.5'}
List of package versions to remove if encounter conflict. Useful when want new version of package, and old recipes likely to function still. Also useful when do not need to use old versions of recipes even if they would no longer function.
allow_version_change_user_packages
¶
allow_version_change_user_packages (Boolean)
Default value False
- If user uploads recipe with changes to package versions,
allow upgrade of package versions. If DAI protected packages are attempted to be changed, can try using pip_install_options toml with [‘–no-deps’]. Or to ignore entirely DAI versions of packages, can try using pip_install_options toml with [‘–ignore-installed’]. Any other experiments relying on recipes with such packages will be affected, use with caution.
acceptance_tests_use_weather_data
¶
acceptance_tests_use_weather_data (Boolean)
Default value False
acceptance_tests_mojo_benchmark
¶
acceptance_tests_mojo_benchmark (Boolean)
Default value False
acceptance_test_timeout
¶
Timeout in minutes for testing acceptance of each recipe (Float) (Expert Setting)
Default value 20.0
Minutes to wait until a recipe’s acceptance testing is aborted. A recipe is rejected if acceptance testing is enabled and times out. One may also set timeout for a specific recipe by setting the class’s staticmethod function called acceptance_test_timeout to return number of minutes to wait until timeout doing acceptance testing. This timeout does not include the time to install required packages.
data_recipe_isolate
¶
Whether to isolate (in fork) data recipe in case imports change needs across. (Boolean)
Default value True
server_recipe_url
¶
server_recipe_url (String)
Default value ''
Space-separated string list of URLs for recipes that are loaded at user login time
recipe_activation
¶
Recipe Activation List (Dict) (Expert Setting)
Default value {'transformers': [], 'models': [], 'scorers': [], 'data': [], 'individuals': []}
List of recipes (per dict key by type) that are applicable for given experiment. This is especially relevant for situations such as new experiment with same params where the user should be able to use the same recipe versions as the parent experiment if he/she wishes to.