Extra Settings
The EXTRA tab in EXPERT SETTINGS provides advanced configuration options for your experimental environment. These settings are designed for experienced users and specific use cases, often used in collaboration with H2O.ai support teams.
Extra Tab Sub-Categories
The EXTRA tab contains one main sub-category with specialized settings for machine learning experiments.
To access these settings, navigate to EXPERT SETTINGS > EXTRA tab from the Experiment Setup page.
The following table describes the available options on the EXTRA page:
Sub-Category |
Description |
|---|---|
[1] Advanced |
Configure specialized settings for advanced scenarios and support collaboration |
[2] Filter by Tags |
Filter and organize extra settings using custom tags and labels |
[3] Save |
Save all extra configuration changes |
[4] Cancel |
Cancel changes and revert to previous configuration settings |
Advanced
The Advanced sub-tab contains specialized configuration options for advanced users and specific use cases. These settings provide fine-tuned control over system behavior and are typically used in collaboration with H2O.ai support teams.
Advanced Settings:
- Time string format for time_abort
What it does: Defines the date and time format string for experiment abortion scheduling
Purpose: Enables scheduling automatic experiment termination at specific times
Format: Python strftime format string for date/time parsing
Default format: %Y-%m-%d %H:%M:%S (YYYY-MM-DD HH:MM:SS)
Example: “2024-12-31 23:59:59” for New Year’s Eve at 11:59 PM
Requires: String value (default: “%Y-%m-%d %H:%M:%S”)
- Time zone for time_abort
What it does: Specifies the timezone for time_abort scheduling
Purpose: Ensures accurate time-based experiment termination across different timezones
Format: Standard timezone identifier (e.g., UTC, EST, PST)
Default timezone: UTC (Coordinated Universal Time)
Impact: Changes take effect immediately for new experiments
Use case: Set this when running experiments across different geographic locations
Requires: String value (default: “UTC”)
- inject_mojo_for_predictions
What it does: Controls whether to use MOJO (Model Object, Optimized) for prediction operations
Purpose: Enables optimized prediction pipelines using MOJO artifacts
Performance: MOJO provides faster prediction inference than standard methods
Impact: Significantly improves prediction speed for production deployments
Use case: Enable for production systems requiring fast inference
- Available options:
Enabled: Uses MOJO for faster predictions
Disabled: Uses standard prediction methods
Requires: Boolean toggle (default: Enabled)
- Relative tolerance for mini MOJO acceptance test
What it does: Sets the relative tolerance threshold for MOJO acceptance testing
Purpose: Validates MOJO accuracy compared to original model predictions
Tolerance: Relative error threshold for acceptance (0 = exact match)
Use case: Ensures MOJO predictions are within acceptable accuracy bounds
Note: Relative tolerance compares percentage difference between predictions
Requires: Float value (default: 0)
- Absolute tolerance for mini MOJO acceptance test
What it does: Sets the absolute tolerance threshold for MOJO acceptance testing
Purpose: Validates MOJO accuracy using absolute error thresholds
Tolerance: Absolute error threshold for acceptance (0 = exact match)
Use case: Ensures MOJO predictions are within acceptable accuracy bounds
Note: Absolute tolerance compares direct difference between prediction values
Requires: Float value (default: 0)
- Number of columns beyond which will not automatically build autoreport at end of experiment
What it does: Sets the maximum column threshold for automatic autoreport generation
Purpose: Prevents automatic autoreport generation for datasets with too many columns
Performance: Large column counts can significantly slow down autoreport generation
Threshold: Experiments with more columns will skip automatic autoreport
Note: Manual autoreport generation remains available regardless of column count
Requires: Integer value (default: 1000)
- Number of columns beyond which will not automatically build pipeline visualization at end of experiment
What it does: Sets the maximum column threshold for automatic pipeline visualization
Purpose: Prevents automatic pipeline visualization for datasets with too many columns
Performance: Large column counts can significantly slow down visualization generation
Threshold: Experiments with more columns will skip automatic pipeline visualization
Note: Manual pipeline visualization remains available regardless of column count
Requires: Integer value (default: 5000)
- Pass environment variables to deprecated python scoring package
What it does: Controls whether to pass environment variables to legacy Python scoring packages
Purpose: Maintains compatibility with deprecated scoring functionality
Legacy support: Ensures backward compatibility with older scoring implementations
- Available options:
Enabled: Passes environment variables to legacy scoring packages
Disabled: Uses standard environment variable handling
Requires: Boolean toggle (default: Enabled)
- Line length for autoreport descriptions of transformers. -1 means use autodoc_keras_summary_line_length
What it does: Sets the maximum line length for transformer descriptions in autoreports
Purpose: Controls formatting and readability of transformer documentation
Auto mode (-1): Uses the autodoc_keras_summary_line_length setting
Custom value: Sets specific line length for transformer descriptions
Note: Longer lines improve readability but may affect layout on narrow displays
Requires: Integer value (default: -1)
- Max size of pipeline.mojo file (in MB) for when benchmark_mojo_latency is set to ‘auto’
What it does: Sets the maximum allowed size for pipeline.mojo files in automatic benchmarking
Purpose: Prevents latency benchmarking of excessively large MOJO files
Performance: Large MOJO files can cause benchmarking timeouts and resource issues
Size limit: MOJO files larger than this threshold skip automatic latency benchmarking
Note: Manual latency benchmarking is still available for large files
Requires: Integer value (default: 2048)
- Size of base models to allow mojo_building_parallelism
What it does: Sets the minimum model size threshold for enabling parallel MOJO building
Purpose: Optimizes MOJO building performance for appropriately sized models
Performance: Parallel building is only beneficial for models above this size threshold
Threshold: Models smaller than this size use sequential MOJO building
Note: Parallel building overhead is not justified for small models
Requires: Integer value (default: 100000000)
- Small data work
What it does: Controls optimization settings for small datasets
Purpose: Applies specialized optimizations for datasets with limited data
Optimization: Uses faster algorithms and reduced complexity for small datasets
Impact: Improves training speed for datasets with fewer than 10,000 rows
Use case: Enable when working with small datasets to avoid overfitting
- Available options:
auto: Automatically determines small data optimizations
on: Forces small data optimizations
off: Disables small data optimizations
Requires: String selection (default: auto)
- min_dt_threads_munging
What it does: Sets the minimum number of datatable threads for data munging operations
Purpose: Ensures minimum threading performance for data preprocessing
Performance: Guarantees minimum parallel processing for munging operations
Threading: Sets floor value for datatable threading during munging
Note: datatable is a high-performance data manipulation library
Requires: Integer value (default: 1)
- min_dt_threads_final_munging
What it does: Sets the minimum number of datatable threads for final munging operations
Purpose: Ensures minimum threading performance for final data preprocessing
Performance: Guarantees minimum parallel processing for final munging operations
Threading: Sets floor value for datatable threading during final munging
Note: datatable is a high-performance data manipulation library
Requires: Integer value (default: 1)
- max_dt_threads_do_timeseries_split_suggestion
What it does: Sets the maximum number of datatable threads for time series split suggestion operations
Purpose: Controls threading for time series analysis and split recommendation
Performance: Limits threading to prevent resource contention during time series operations
Threading: Sets ceiling value for datatable threading during time series split suggestions
Note: datatable is a high-performance data manipulation library
Requires: Integer value (default: 1)
- Whether to keep Kaggle submission file in experiment directory
What it does: Controls whether to retain Kaggle submission files in the experiment output directory
Purpose: Manages storage of Kaggle competition submission artifacts
Storage: Controls disk space usage for Kaggle-related files
- Available options:
Enabled: Keeps Kaggle submission files in experiment directory
Disabled: Removes Kaggle submission files after processing
Requires: Boolean toggle (default: Enabled)
- Custom Kaggle competitions to make automatic test set submissions for
What it does: Specifies custom Kaggle competitions for automatic test set submissions
Purpose: Enables automated submission to specific Kaggle competitions
Format: List of competition identifiers or names
Use case: Streamlines Kaggle competition workflow for specific contests
Requires: List of strings (default: [])
- ping_period
What it does: Sets the interval in seconds for system status ping during experiments
Purpose: Controls system monitoring frequency during experiment execution
Monitoring: Enables periodic system health checks and status updates
Performance: More frequent pings provide better monitoring but use more resources
Requires: Integer value (default: 60)
- Whether to enable ping of system status during DAI experiments
What it does: Controls whether to enable system status monitoring during experiments
Purpose: Provides real-time system health monitoring during experiment execution
Monitoring: Tracks system performance, resource usage, and experiment progress
- Available options:
Enabled: Enables system status monitoring
Disabled: Disables system status monitoring
Requires: Boolean toggle (default: Enabled)
- stall_disk_limit_gb
What it does: Sets the disk space limit in GB for stalling operations
Purpose: Prevents disk space issues by limiting stalling operation disk usage
Storage: Controls disk space allocation for temporary stalling operations
Performance: Helps prevent system crashes due to disk space exhaustion
Requires: Float value (default: 1)
- min_rows_per_class
What it does: Sets the minimum number of rows required per class for classification problems
Purpose: Ensures sufficient data for each class in classification tasks
Quality: Prevents training on classes with insufficient data
Threshold: Classes with fewer rows may be excluded or handled specially
Requires: Integer value (default: 5)
- min_rows_per_split
What it does: Sets the minimum number of rows required per data split
Purpose: Ensures sufficient data for each split in cross-validation or train/test splits
Quality: Prevents splits with insufficient data that could lead to poor model performance
Threshold: Splits with fewer rows may be adjusted or excluded
Requires: Integer value (default: 5)
- tf_nan_impute_value
What it does: Sets the value used to impute NaN (Not a Number) values in TensorFlow models
Purpose: Handles missing values in TensorFlow model inputs
Imputation: Replaces NaN values with the specified value during model processing
Default value: -5 (negative value to distinguish from real data)
Requires: Float value (default: -5)
- statistical_threshold_data_size_small
What it does: Sets the threshold for considering a dataset as small for statistical operations
Purpose: Determines when to apply small dataset optimizations for statistical calculations
Performance: Smaller datasets may use different algorithms or parameters
Threshold: Datasets below this size are considered small for statistical operations
Requires: Integer value (default: 100000)
- statistical_threshold_data_size_large
What it does: Sets the threshold for considering a dataset as large for statistical operations
Purpose: Determines when to apply large dataset optimizations for statistical calculations
Performance: Larger datasets may use different algorithms or sampling strategies
Threshold: Datasets above this size are considered large for statistical operations
Requires: Integer value (default: 500000000)
- aux_threshold_data_size_large
What it does: Sets the threshold for auxiliary operations on large datasets
Purpose: Determines when to apply large dataset optimizations for auxiliary operations
Performance: Controls memory and processing optimizations for auxiliary tasks
Threshold: Datasets above this size trigger large dataset auxiliary optimizations
Requires: Integer value (default: 10000000)
- set_method_sampling_row_limit
What it does: Sets the maximum number of rows for method sampling operations
Purpose: Limits the number of rows used in sampling-based method evaluations
Performance: Prevents excessive memory usage in sampling operations
Sampling: Controls the scope of sampling for method performance evaluation
Requires: Integer value (default: 5000000)
- performance_threshold_data_size_small
What it does: Sets the threshold for considering a dataset as small for performance optimizations
Purpose: Determines when to apply small dataset performance optimizations
Performance: Smaller datasets may use different performance tuning strategies
Threshold: Datasets below this size are considered small for performance operations
Requires: Integer value (default: 100000)
- performance_threshold_data_size_large
What it does: Sets the threshold for considering a dataset as large for performance optimizations
Purpose: Determines when to apply large dataset performance optimizations
Performance: Larger datasets may use different performance tuning strategies
Threshold: Datasets above this size are considered large for performance operations
Requires: Integer value (default: 100000000)
- gpu_default_threshold_data_size_large
What it does: Sets the threshold for default GPU usage on large datasets
Purpose: Determines when to automatically enable GPU acceleration for large datasets
Performance: GPU acceleration is most beneficial for datasets above this threshold
Threshold: Datasets above this size automatically trigger GPU usage
Requires: Integer value (default: 1000000)
- max_relative_cols_mismatch_allowed
What it does: Sets the maximum allowed relative column mismatch between datasets
Purpose: Controls tolerance for column differences between training and validation data
Validation: Ensures data consistency across different dataset splits
Tolerance: Maximum percentage of column mismatch allowed (0.5 = 50%)
Requires: Float value (default: 0.5)
- max_rows_final_blender
What it does: Sets the maximum number of rows for final model blending operations
Purpose: Limits the number of rows used in final ensemble blending
Performance: Prevents excessive memory usage in final blending operations
Blending: Controls the scope of final model ensemble blending
Requires: Integer value (default: 1000000)
- min_rows_final_blender
What it does: Sets the minimum number of rows required for final model blending operations
Purpose: Ensures sufficient data for reliable final model blending
Quality: Prevents blending on datasets too small for reliable ensemble creation
Threshold: Datasets below this size may skip final blending
Requires: Integer value (default: 10000)
- max_rows_final_train_score
What it does: Sets the maximum number of rows for final training score calculations
Purpose: Limits the number of rows used in final training score evaluation
Performance: Prevents excessive computation time for training score calculation
Evaluation: Controls the scope of final training score assessment
Requires: Integer value (default: 5000000)
- max_rows_final_roccmconf
What it does: Sets the maximum number of rows for final ROC confusion matrix calculations
Purpose: Limits the number of rows used in ROC and confusion matrix evaluation
Performance: Prevents excessive computation time for ROC calculations
Evaluation: Controls the scope of final ROC and confusion matrix assessment
Requires: Integer value (default: 1000000)
- max_rows_final_holdout_score
What it does: Sets the maximum number of rows for final holdout score calculations
Purpose: Limits the number of rows used in final holdout score evaluation
Performance: Prevents excessive computation time for holdout score calculation
Evaluation: Controls the scope of final holdout score assessment
Requires: Integer value (default: 5000000)
- max_rows_final_holdout_bootstrap_score
What it does: Sets the maximum number of rows for final holdout bootstrap score calculations
Purpose: Limits the number of rows used in final holdout bootstrap score evaluation
Performance: Prevents excessive computation time for bootstrap score calculation
Evaluation: Controls the scope of final holdout bootstrap score assessment
Requires: Integer value (default: 1000000)
- Max. rows for leakage detection if wide rules used on wide data
What it does: Sets the maximum number of rows for leakage detection when using wide rules on wide datasets
Purpose: Limits the scope of leakage detection to prevent excessive computation time
Performance: Wide rules on wide data can be computationally expensive
Threshold: Datasets exceeding this limit may use sampling for leakage detection
Requires: Integer value (default: 100000)
- Num. simultaneous predictions for feature selection (0 = auto)
What it does: Sets the number of simultaneous predictions during feature selection operations
Purpose: Controls parallel processing for feature selection prediction tasks
Auto mode (0): Uses automatic parallel processing based on system resources
Custom value: Limits simultaneous predictions to specified number
Performance: More simultaneous predictions can speed up feature selection
Requires: Integer value (default: 0)
- Num. simultaneous fits for shift and leak checks if using LightGBM on CPU (0 = auto)
What it does: Sets the number of simultaneous LightGBM fits for shift and leakage checks on CPU
Purpose: Controls parallel processing for shift and leakage detection using LightGBM
Auto mode (0): Uses automatic parallel processing based on CPU resources
Custom value: Limits simultaneous fits to specified number
Performance: More simultaneous fits can speed up shift and leakage detection
Requires: Integer value (default: 0)
- max_orig_nonnumeric_cols_selected_default
What it does: Sets the maximum number of original non-numeric columns selected by default
Purpose: Controls the default selection of non-numeric columns for feature engineering
Selection: Limits the number of non-numeric columns automatically included
Performance: Helps manage computational complexity for non-numeric features
Requires: Integer value (default: 300)
- max_orig_cols_selected_simple_factor
What it does: Sets the factor for maximum original columns selected in simple scenarios
Purpose: Controls column selection scaling factor for simple feature engineering
Scaling: Multiplies base column selection by this factor for simple cases
Performance: Helps balance feature richness with computational efficiency
Requires: Integer value (default: 2)
- fs_orig_cols_selected_simple_factor
What it does: Sets the factor for original columns selected in feature selection simple scenarios
Purpose: Controls column selection scaling factor for simple feature selection
Scaling: Multiplies base column selection by this factor for simple feature selection
Performance: Helps balance feature selection scope with computational efficiency
Requires: Integer value (default: 2)
- Allow supported models to do feature selection by permutation importance within model itself
What it does: Enables models to perform feature selection using permutation importance internally
Purpose: Allows models to automatically select features based on their own importance calculations
Efficiency: Reduces need for separate feature selection steps
- Available options:
Enabled: Models perform internal feature selection
Disabled: Uses external feature selection methods only
Requires: Boolean toggle (default: Enabled)
- Whether to use native categorical handling (CPU only) for LightGBM when doing feature selection by permutation
What it does: Controls whether to use LightGBM’s native categorical handling for permutation-based feature selection
Purpose: Optimizes categorical feature handling during permutation importance calculations
Performance: Native handling can be faster but is CPU-only
- Available options:
Enabled: Uses LightGBM native categorical handling
Disabled: Uses standard categorical handling
Requires: Boolean toggle (default: Enabled)
- Maximum number of original columns up to which will compute standard deviation of original feature importance. Can be expensive if many features.
What it does: Sets the maximum number of original columns for computing feature importance standard deviation
Purpose: Limits computation of feature importance statistics to prevent performance issues
Performance: Computing standard deviation can be expensive with many features
Threshold: Only computes standard deviation for datasets with columns below this limit
Requires: Integer value (default: 1000)
- num_folds
What it does: Sets the number of cross-validation folds for model evaluation
Purpose: Controls the number of folds used in cross-validation for model assessment
Validation: More folds provide more robust evaluation but increase computation time
Balance: Typical values range from 3-10 folds depending on dataset size
Requires: Integer value (default: 3)
- full_cv_accuracy_switch
What it does: Sets the accuracy threshold for switching to full cross-validation
Purpose: Determines when to use full cross-validation based on model accuracy
Optimization: Uses faster validation methods until accuracy reaches this threshold
Threshold: Models with accuracy above this value use full cross-validation
Requires: Integer value (default: 9)
- ensemble_accuracy_switch
What it does: Sets the accuracy threshold for switching to ensemble methods
Purpose: Determines when to use ensemble methods based on individual model accuracy
Optimization: Uses single models until accuracy reaches this threshold
Threshold: Models with accuracy above this value may trigger ensemble creation
Requires: Integer value (default: 5)
- num_ensemble_folds
What it does: Sets the number of folds used for ensemble model evaluation
Purpose: Controls cross-validation folds specifically for ensemble model assessment
Ensemble: Determines the robustness of ensemble model evaluation
Performance: More folds provide better ensemble evaluation but increase computation
Requires: Integer value (default: 4)
- fold_reps
What it does: Sets the number of repetitions for each fold in cross-validation
Purpose: Controls fold repetition for more robust cross-validation results
Robustness: Multiple repetitions help reduce variance in cross-validation scores
Performance: More repetitions increase computation time but improve reliability
Requires: Integer value (default: 1)
- max_num_classes_hard_limit
What it does: Sets the hard limit for the maximum number of classes in classification problems
Purpose: Prevents excessive computation for classification problems with too many classes
Performance: Large numbers of classes can significantly slow down training and prediction
Limit: Classification problems with more classes may be handled differently
Requires: Integer value (default: 10000)
- min_roc_sample_size
What it does: Sets the minimum sample size for ROC (Receiver Operating Characteristic) calculations
Purpose: Ensures sufficient data for reliable ROC curve computation
Quality: Prevents ROC calculations on datasets too small for reliable results
Threshold: Datasets below this size may skip ROC calculations or use simplified methods
Requires: Integer value (default: 1)
- enable_strict_confict_key_check_for_brain
What it does: Enables strict conflict key checking for the Feature Brain system
Purpose: Provides more rigorous validation of configuration keys in Feature Brain
Validation: Helps prevent configuration conflicts and inconsistencies
- Available options:
Enabled: Uses strict conflict key checking
Disabled: Uses standard key checking
Requires: Boolean toggle (default: Enabled)
- For feature brain or restart/refit, whether to allow brain ingest to use different feature engineering layer count
What it does: Controls whether Feature Brain can use different feature engineering layer counts during ingest
Purpose: Provides flexibility in feature engineering layer configuration for brain operations
Flexibility: Allows adaptation to different feature engineering requirements
- Available options:
Enabled: Allows different layer counts
Disabled: Requires consistent layer counts
Requires: Boolean toggle (default: Disabled)
- brain_maximum_diff_score
What it does: Sets the maximum allowed score difference for Feature Brain operations
Purpose: Controls the tolerance for score differences in brain-based feature selection
Tolerance: Allows small score differences while maintaining brain efficiency
Threshold: Score differences above this value may trigger brain adjustments
Requires: Float value (default: 0.1)
- brain_max_size_GB
What it does: Sets the maximum size in GB for Feature Brain memory usage
Purpose: Controls memory allocation for Feature Brain operations
Memory: Prevents excessive memory usage by Feature Brain system
Limit: Brain operations exceeding this size may use memory optimization strategies
Requires: Float value (default: 20)
- early_stopping
What it does: Controls whether to enable early stopping for model training
Purpose: Prevents overfitting by stopping training when validation performance stops improving
Optimization: Reduces training time and improves generalization
- Available options:
Enabled: Uses early stopping during training
Disabled: Trains for full specified duration
Requires: Boolean toggle (default: Enabled)
- early_stopping_per_individual
What it does: Controls whether to enable early stopping for individual models in genetic algorithm
Purpose: Applies early stopping to individual models during genetic algorithm evolution
Optimization: Improves efficiency of genetic algorithm by stopping poor performers early
- Available options:
Enabled: Uses early stopping for individuals
Disabled: Trains all individuals for full duration
Requires: Boolean toggle (default: Enabled)
- text_dominated_limit_tuning
What it does: Controls tuning limits for text-dominated datasets
Purpose: Applies specialized tuning limits when text features dominate the dataset
Optimization: Adjusts tuning parameters for optimal text processing performance
- Available options:
Enabled: Applies text-dominated tuning limits
Disabled: Uses standard tuning limits
Requires: Boolean toggle (default: Enabled)
- image_dominated_limit_tuning
What it does: Controls tuning limits for image-dominated datasets
Purpose: Applies specialized tuning limits when image features dominate the dataset
Optimization: Adjusts tuning parameters for optimal image processing performance
- Available options:
Enabled: Applies image-dominated tuning limits
Disabled: Uses standard tuning limits
Requires: Boolean toggle (default: Enabled)
- supported_image_types
What it does: Specifies the supported image file types for image processing
Purpose: Defines which image formats can be processed by the system
Compatibility: Ensures only supported image types are processed
Format: List of supported image file extensions
Requires: List of strings (default: [“jpg”, “jpeg”, “png”, “bmp”, “tiff”])
- image_paths_absolute
What it does: Controls whether image paths are treated as absolute paths
Purpose: Determines how image file paths are interpreted and resolved
Path handling: Absolute paths are resolved from root, relative paths from current directory
- Available options:
Enabled: Treats image paths as absolute
Disabled: Treats image paths as relative
Requires: Boolean toggle (default: Disabled)
- text_dl_token_pad_percentile
What it does: Sets the percentile for token padding in deep learning text processing
Purpose: Controls token sequence padding length based on dataset percentile
Padding: Determines how much padding to add to text sequences for consistent length
Percentile: Uses specified percentile of sequence lengths for padding calculation
Requires: Integer value (default: 99)
- text_dl_token_pad_max
What it does: Sets the maximum token padding length for deep learning text processing
Purpose: Limits the maximum length of padded text sequences
Padding: Prevents excessive padding that could waste memory or computation
Limit: Text sequences are padded up to this maximum length
Requires: Integer value (default: 512)
- tune_parameters_accuracy_switch
What it does: Sets the accuracy threshold for switching to parameter tuning
Purpose: Determines when to enable parameter tuning based on model accuracy
Optimization: Uses basic parameters until accuracy reaches this threshold
Threshold: Models with accuracy above this value trigger parameter tuning
Requires: Integer value (default: 3)
- tune_target_transform_accuracy_switch
What it does: Sets the accuracy threshold for switching to target transformation tuning
Purpose: Determines when to enable target transformation tuning based on model accuracy
Optimization: Uses standard target handling until accuracy reaches this threshold
Threshold: Models with accuracy above this value trigger target transformation tuning
Requires: Integer value (default: 5)
- tournament_uniform_style_interpretability_switch
What it does: Sets the interpretability threshold for uniform style tournament selection
Purpose: Determines when to use uniform style tournament based on interpretability setting
Tournament: Controls tournament selection strategy based on interpretability requirements
Threshold: Interpretability settings above this value trigger uniform style tournament
Requires: Integer value (default: 8)
- tournament_uniform_style_accuracy_switch
What it does: Sets the accuracy threshold for uniform style tournament selection
Purpose: Determines when to use uniform style tournament based on model accuracy
Tournament: Controls tournament selection strategy based on accuracy requirements
Threshold: Models with accuracy above this value trigger uniform style tournament
Requires: Integer value (default: 6)
- tournament_model_style_accuracy_switch
What it does: Sets the accuracy threshold for model style tournament selection
Purpose: Determines when to use model style tournament based on model accuracy
Tournament: Controls tournament selection strategy focusing on model characteristics
Threshold: Models with accuracy above this value trigger model style tournament
Requires: Integer value (default: 6)
- tournament_feature_style_accuracy_switch
What it does: Sets the accuracy threshold for feature style tournament selection
Purpose: Determines when to use feature style tournament based on model accuracy
Tournament: Controls tournament selection strategy focusing on feature characteristics
Threshold: Models with accuracy above this value trigger feature style tournament
Requires: Integer value (default: 13)
- tournament_fullstack_style_accuracy_switch
What it does: Sets the accuracy threshold for fullstack style tournament selection
Purpose: Determines when to use fullstack style tournament based on model accuracy
Tournament: Controls tournament selection strategy using full pipeline evaluation
Threshold: Models with accuracy above this value trigger fullstack style tournament
Requires: Integer value (default: 13)
- tournament_use_feature_penalized_score
What it does: Controls whether to use feature-penalized scoring in tournament selection
Purpose: Applies penalty to scores based on feature complexity in tournament evaluation
Scoring: Adjusts model scores to account for feature engineering complexity
- Available options:
Enabled: Uses feature-penalized scoring
Disabled: Uses standard scoring without feature penalties
Requires: Boolean toggle (default: Enabled)
- tournament_keep_poor_scores_for_small_data
What it does: Controls whether to keep poor scoring models for small datasets
Purpose: Retains models with poor scores when working with limited data
Small data: Helps maintain diversity in model selection for small datasets
- Available options:
Enabled: Keeps poor scoring models for small data
Disabled: Removes poor scoring models regardless of dataset size
Requires: Boolean toggle (default: Enabled)
- tournament_remove_poor_scores_before_evolution_model_factor
What it does: Sets the model factor for removing poor scores before evolution phase
Purpose: Controls which models to remove based on score thresholds before evolution
Evolution: Filters out poor performers to focus evolution on better models
Factor: Multiplier applied to score thresholds for model removal decisions
Requires: Float value (default: 0.7)
- tournament_remove_worse_than_constant_before_evolution
What it does: Controls whether to remove models worse than constant models before evolution
Purpose: Removes models that perform worse than simple constant models
Evolution: Ensures evolution focuses on models better than baseline constant models
- Available options:
Enabled: Removes models worse than constants
Disabled: Keeps all models regardless of constant model performance
Requires: Boolean toggle (default: Enabled)
- tournament_keep_absolute_ok_scores_before_evolution_model_factor
What it does: Sets the model factor for keeping absolutely OK scores before evolution
Purpose: Controls retention of models with acceptable absolute scores before evolution
Evolution: Ensures models with good absolute performance are retained
Factor: Multiplier applied to absolute score thresholds for model retention
Requires: Float value (default: 0.2)
- tournament_remove_poor_scores_before_final_model_factor
What it does: Sets the model factor for removing poor scores before final model selection
Purpose: Controls which models to remove based on score thresholds before final selection
Final selection: Filters out poor performers to focus final selection on better models
Factor: Multiplier applied to score thresholds for final model removal decisions
Requires: Float value (default: 0.3)
- tournament_remove_worse_than_constant_before_final_model
What it does: Controls whether to remove models worse than constant models before final selection
Purpose: Removes models that perform worse than simple constant models before final selection
Final selection: Ensures final selection focuses on models better than baseline constants
- Available options:
Enabled: Removes models worse than constants
Disabled: Keeps all models regardless of constant model performance
Requires: Boolean toggle (default: Enabled)
- num_individuals
What it does: Sets the number of individuals in the genetic algorithm population
Purpose: Controls the population size for genetic algorithm evolution
Evolution: Larger populations provide more diversity but increase computation time
Balance: Typical values range from 2-10 individuals depending on problem complexity
Requires: Integer value (default: 2)
- cv_in_cv_overconfidence_protection_factor
What it does: Sets the protection factor for cross-validation within cross-validation overconfidence
Purpose: Provides protection against overconfident predictions in nested cross-validation
Overconfidence: Reduces overconfidence in model predictions through nested validation
Factor: Multiplier applied to overconfidence protection mechanisms
Requires: Integer value (default: 3)
- Exclude specific transformers
What it does: Specifies which transformers to exclude from the experiment
Purpose: Allows exclusion of specific transformers that may not be suitable for the dataset
Exclusion: Removes specified transformers from the available transformer pool
Format: List of transformer names or identifiers to exclude
Use case: Useful for excluding transformers known to cause issues with specific data types
Requires: List of strings (default: [])
- Exclude specific genes
What it does: Specifies which genes (transformer instances) to exclude from the experiment
Purpose: Allows exclusion of specific gene configurations from genetic algorithm
Exclusion: Removes specified genes from the available gene pool
Format: List of gene identifiers or configurations to exclude
Use case: Useful for excluding problematic gene configurations
Requires: List of strings (default: [])
- Exclude specific models
What it does: Specifies which models to exclude from the experiment
Purpose: Allows exclusion of specific models that may not be suitable for the dataset
Exclusion: Removes specified models from the available model pool
Format: List of model names or types to exclude
Use case: Useful for excluding models that don’t work well with specific data characteristics
Requires: List of strings (default: [])
- Exclude specific pretransformers
What it does: Specifies which pretransformers to exclude from the experiment
Purpose: Allows exclusion of specific pretransformers that may not be suitable
Exclusion: Removes specified pretransformers from the available pretransformer pool
Format: List of pretransformer names or types to exclude
Use case: Useful for excluding pretransformers that cause issues with specific data
Requires: List of strings (default: [])
- Exclude specific data recipes
What it does: Specifies which data recipes to exclude from the experiment
Purpose: Allows exclusion of specific data recipes that may not be suitable
Exclusion: Removes specified data recipes from the available recipe pool
Format: List of data recipe names or types to exclude
Use case: Useful for excluding data recipes that don’t work well with specific datasets
Requires: List of strings (default: [])
- Exclude specific individual recipes
What it does: Specifies which individual recipes to exclude from the experiment
Purpose: Allows exclusion of specific individual recipe configurations
Exclusion: Removes specified individual recipes from the available recipe pool
Format: List of individual recipe identifiers to exclude
Use case: Useful for excluding problematic individual recipe configurations
Requires: List of strings (default: [])
- Exclude specific scorers
What it does: Specifies which scorers to exclude from the experiment
Purpose: Allows exclusion of specific scorers that may not be suitable for the problem type
Exclusion: Removes specified scorers from the available scorer pool
Format: List of scorer names or types to exclude
Use case: Useful for excluding scorers that don’t work well with specific problem types
Requires: List of strings (default: [])
- use_dask_for_1_gpu
What it does: Controls whether to use Dask distributed computing for single GPU scenarios
Purpose: Enables Dask distributed processing even when only one GPU is available
Distributed: Provides distributed computing benefits even in single GPU setups
- Available options:
Enabled: Uses Dask for single GPU scenarios
Disabled: Uses standard processing for single GPU
Requires: Boolean toggle (default: Disabled)
- Set Optuna pruner constructor args
What it does: Sets the constructor arguments for Optuna pruner configuration
Purpose: Configures Optuna hyperparameter optimization pruner behavior
Pruning: Controls how Optuna prunes unpromising trials during optimization
Configuration: JSON object with pruner-specific parameters
Default configuration: Includes startup trials, warmup steps, interval steps, and reduction factor
Requires: JSON object (default: {“n_startup_trials”:5,”n_warmup_steps”:20,”interval_steps”:20,”percentile”:25,”min_resource”:”auto”,”max_resource”:”auto”,”reduction_factor”:4,”min_early_stopping_rate”:0,”n_brackets”:4,”min_early_stopping_rate_low”:0,”upper”:1,”lower”:0})
- Set Optuna sampler constructor args
What it does: Sets the constructor arguments for Optuna sampler configuration
Purpose: Configures Optuna hyperparameter optimization sampler behavior
Sampling: Controls how Optuna samples hyperparameter values during optimization
Configuration: JSON object with sampler-specific parameters
Default configuration: Empty object uses default sampling behavior
Requires: JSON object (default: {})
- drop_constant_model_final_ensemble
What it does: Controls whether to drop constant models from the final ensemble
Purpose: Removes constant (baseline) models from the final ensemble selection
Ensemble: Ensures final ensemble focuses on non-constant models only
- Available options:
Enabled: Drops constant models from final ensemble
Disabled: Includes constant models in final ensemble
Requires: Boolean toggle (default: Enabled)
- xgboost_rf_exact_threshold_num_rows_x_cols
What it does: Sets the threshold for XGBoost Random Forest exact mode based on rows × columns
Purpose: Determines when to use exact mode for XGBoost Random Forest based on data size
Performance: Exact mode is more accurate but slower for large datasets
Threshold: Datasets with rows × columns below this value use exact mode
Requires: Integer value (default: 10000)
- Factor by which to drop max_leaves from effective max_depth value when doing loss_guide
What it does: Sets the factor for reducing max_leaves from max_depth in loss-guided training
Purpose: Controls the relationship between max_depth and max_leaves in loss-guided mode
Training: Adjusts leaf count to optimize loss-guided training performance
Factor: Divides max_depth by this factor to determine effective max_leaves
Requires: Integer value (default: 4)
- Factor by which to extend max_depth mutations when doing loss_guide
What it does: Sets the factor for extending max_depth mutations in loss-guided training
Purpose: Controls how max_depth is extended during mutations in loss-guided mode
Mutations: Adjusts depth mutations to optimize loss-guided evolution
Factor: Multiplies max_depth by this factor during mutation operations
Requires: Integer value (default: 8)
- params_tune_grow_policy_simple_trees
What it does: Controls whether to force max_leaves=0 when grow_policy=”depthwise” and max_depth=0 when grow_policy=”lossguide” during simple tree model tuning.
Purpose: Ensures that the tree parameters are properly zeroed according to the chosen grow policy type for simple tree tuning.
- Available options:
Enabled: Forces max_leaves or max_depth to 0 according to the grow_policy setting.
Disabled: Does not force max_leaves or max_depth to 0.
Requires: Boolean toggle (default: Enabled)
- max_epochs_tf_big_data
What it does: Sets the maximum number of epochs for TensorFlow models on big datasets
Purpose: Limits training epochs for TensorFlow models when working with large datasets
Performance: Prevents excessive training time on large datasets
Limit: TensorFlow models stop training after this many epochs on big data
Requires: Integer value (default: 5)
- default_max_bin
What it does: Sets the default maximum number of bins for feature binning
Purpose: Controls the default binning resolution for numerical features
Binning: Higher values provide finer granularity but increase computation
Default: Used when no specific binning configuration is provided
Requires: Integer value (default: 256)
- default_lightgbm_max_bin
What it does: Sets the default maximum number of bins for LightGBM models
Purpose: Controls the default binning resolution specifically for LightGBM
LightGBM: Optimized binning parameter for LightGBM model performance
Default: Used when no specific LightGBM binning configuration is provided
Requires: Integer value (default: 249)
- min_max_bin
What it does: Sets the minimum maximum number of bins allowed
Purpose: Ensures a minimum level of binning granularity
Binning: Prevents excessive reduction in binning resolution
Minimum: Guarantees at least this many bins for numerical features
Requires: Integer value (default: 32)
- tensorflow_use_all_cores
What it does: Controls whether TensorFlow should use all available CPU cores
Purpose: Enables TensorFlow to utilize all CPU cores for parallel processing
Performance: Can significantly improve TensorFlow training and inference speed
- Available options:
Enabled: Uses all available CPU cores
Disabled: Uses default TensorFlow core allocation
Requires: Boolean toggle (default: Enabled)
- tensorflow_use_all_cores_even_if_reproducible_true
What it does: Controls whether TensorFlow uses all cores even when reproducibility is enabled
Purpose: Allows full core utilization even in reproducible mode
Reproducibility: May slightly affect reproducibility but improves performance
- Available options:
Enabled: Uses all cores regardless of reproducibility setting
Disabled: Respects reproducibility settings for core allocation
Requires: Boolean toggle (default: Disabled)
- tensorflow_disable_memory_optimization
What it does: Controls whether to disable TensorFlow memory optimization
Purpose: Allows disabling TensorFlow’s automatic memory optimization features
Memory: May use more memory but can improve performance in some cases
- Available options:
Enabled: Disables TensorFlow memory optimization
Disabled: Uses TensorFlow default memory optimization
Requires: Boolean toggle (default: Enabled)
- tensorflow_cores
What it does: Sets the number of CPU cores to use for TensorFlow operations
Purpose: Controls TensorFlow CPU core allocation for parallel processing
Performance: More cores can improve TensorFlow performance for large models
Auto mode (0): Uses automatic core allocation
Custom value: Limits TensorFlow to specified number of cores
Requires: Integer value (default: 0)
- tensorflow_model_max_cores
What it does: Sets the maximum number of cores per TensorFlow model
Purpose: Controls maximum core allocation for individual TensorFlow models
Performance: Limits per-model core usage to prevent resource contention
Limit: Each TensorFlow model uses at most this many cores
Requires: Integer value (default: 4)
- bert_cores
What it does: Sets the number of CPU cores to use for BERT model operations
Purpose: Controls BERT model CPU core allocation for parallel processing
Performance: More cores can improve BERT model performance
Auto mode (0): Uses automatic core allocation
Custom value: Limits BERT models to specified number of cores
Requires: Integer value (default: 0)
- bert_use_all_cores
What it does: Controls whether BERT models should use all available CPU cores
Purpose: Enables BERT models to utilize all CPU cores for parallel processing
Performance: Can significantly improve BERT model training and inference speed
- Available options:
Enabled: Uses all available CPU cores
Disabled: Uses default BERT core allocation
Requires: Boolean toggle (default: Enabled)
- bert_model_max_cores
What it does: Sets the maximum number of cores per BERT model
Purpose: Controls maximum core allocation for individual BERT models
Performance: Limits per-model core usage to prevent resource contention
Limit: Each BERT model uses at most this many cores
Requires: Integer value (default: 8)
- rulefit_max_tree_depth
What it does: Sets the maximum tree depth for RuleFit models
Purpose: Controls the maximum depth of trees used in RuleFit ensemble
RuleFit: Deeper trees can capture more complex patterns but increase overfitting risk
Limit: RuleFit trees are limited to this maximum depth
Requires: Integer value (default: 6)
- rulefit_max_num_trees
What it does: Sets the maximum number of trees for RuleFit models
Purpose: Controls the maximum number of trees in the RuleFit ensemble
RuleFit: More trees can improve performance but increase computation time
Limit: RuleFit ensemble is limited to this maximum number of trees
Requires: Integer value (default: 500)
- Whether to show real levels in One Hot Encoding feature names
What it does: Controls whether to include actual level values in One Hot Encoding feature names
Purpose: Determines feature name format for One Hot Encoding transformations
Feature names: Real levels can make feature names longer but more descriptive
Aggregation: Can cause feature aggregation problems when switching between binning modes
- Available options:
Enabled: Shows real levels in feature names
Disabled: Uses generic feature names without real levels
Requires: Boolean toggle (default: Disabled)
- Enable basic logging and notifications for ensemble meta learner
What it does: Enables basic logging and notifications for ensemble meta learner operations
Purpose: Provides logging information about ensemble meta learner performance
Monitoring: Helps track ensemble meta learner behavior and performance
- Available options:
Enabled: Enables basic ensemble meta learner logging
Disabled: Disables ensemble meta learner logging
Requires: Boolean toggle (default: Enabled)
- Enable extra logging for ensemble meta learner
What it does: Enables additional detailed logging for ensemble meta learner operations
Purpose: Provides comprehensive logging information about ensemble meta learner
Monitoring: Includes detailed performance metrics and behavior tracking
- Available options:
Enabled: Enables extra ensemble meta learner logging
Disabled: Uses only basic ensemble meta learner logging
Requires: Boolean toggle (default: Disabled)
- Maximum number of fold IDs to show in logs
What it does: Sets the maximum number of fold IDs to display in log messages
Purpose: Limits the verbosity of fold-related log information
Logging: Prevents log messages from becoming too long with many folds
Limit: Only shows fold IDs up to this maximum number in logs
Requires: Integer value (default: 10)
- Declare positive fold scores as unstable if stddev / mean is larger than this value
What it does: Sets the threshold for declaring fold scores as unstable based on coefficient of variation
Purpose: Identifies unstable fold scores that may indicate overfitting or data issues
Stability: Higher values indicate more variable fold scores
Threshold: Fold scores with stddev/mean above this value are marked as unstable
Requires: Float value (default: 0.25)
- Perform stratified sampling for binary classification if the dataset has fewer rows than this
What it does: Sets the dataset size threshold for stratified sampling in binary classification
Purpose: Ensures stratified sampling is used for smaller binary classification datasets
Sampling: Stratified sampling helps maintain class balance in smaller datasets
Threshold: Datasets with fewer rows than this value use stratified sampling
Requires: Integer value (default: 1000000)
- Ratio of most frequent to least frequent class for imbalanced multiclass classification problems
What it does: Sets the class imbalance ratio threshold for triggering special multiclass handling
Purpose: Identifies severely imbalanced multiclass problems requiring special treatment
Imbalance: Higher ratios indicate more severe class imbalance
Threshold: Problems with class ratios above this value trigger special handling
Requires: Float value (default: 5)
- Ratio of most frequent to least frequent class for heavily imbalanced multiclass classification problems
What it does: Sets the class imbalance ratio threshold for triggering heavy imbalance handling
Purpose: Identifies extremely imbalanced multiclass problems requiring special treatment
Heavy imbalance: Very high ratios indicate extreme class imbalance
Threshold: Problems with class ratios above this value trigger heavy imbalance handling
Requires: Float value (default: 25)
**Whether to do rank averaging bagged models inside of imbalanced models, instead of probability
- averaging**
What it does: Controls whether to use rank averaging, instead of probability averaging, when bagging models inside of imbalanced models.
Purpose: Rank averaging can be helpful when ensembling diverse models when ranking metrics like AUC/Gini are optimized.
Averaging: Rank averaging may provide improved performance for imbalanced datasets focused on ranking metrics.
Note: No MOJO support yet for rank averaging in this context.
- Available options:
auto: Automatically decide if rank averaging should be applied * on: Always use rank averaging for bagged models in imbalanced settings
off: Never use rank averaging (always use probability averaging)
Requires: String selection (default: auto)
- imbalance_ratio_notification_threshold
What it does: Sets the class imbalance ratio threshold for sending notifications
Purpose: Triggers notifications when class imbalance exceeds this threshold
Monitoring: Alerts users to potential class imbalance issues
Threshold: Problems with class ratios above this value trigger notifications
Requires: Float value (default: 2)
- nbins_ftrl_list
What it does: Sets the list of binning values for FTRL (Follow The Regularized Leader) models
Purpose: Defines multiple binning options for FTRL model hyperparameter tuning
Tuning: Provides different binning granularities for FTRL model optimization
Values: List of binning values to test during FTRL model training
Requires: List of integers (default: [1000000,10000000,100000000])
- te_bin_list
What it does: Sets the list of binning values for Target Encoding (TE) transformations
Purpose: Defines multiple binning options for Target Encoding hyperparameter tuning
Tuning: Provides different binning granularities for Target Encoding optimization
Values: List of binning values to test during Target Encoding training
Requires: List of integers (default: [25,10,100,250])
- woe_bin_list
What it does: Sets the list of binning values for Weight of Evidence (WoE) transformations
Purpose: Defines multiple binning options for WoE hyperparameter tuning
Tuning: Provides different binning granularities for WoE optimization
Values: List of binning values to test during WoE training
Requires: List of integers (default: [25,10,100,250])
- ohe_bin_list
What it does: Sets the list of binning values for One Hot Encoding (OHE) transformations
Purpose: Defines multiple binning options for OHE hyperparameter tuning
Tuning: Provides different binning granularities for OHE optimization
Values: List of binning values to test during OHE training
Requires: List of integers (default: [10,25,50,75,100])
- binner_bin_list
What it does: Sets the list of binning values for Binner Transformer
Purpose: Defines multiple binning options for Binner Transformer hyperparameter tuning
Tuning: Provides different binning granularities for Binner Transformer optimization
Values: List of binning values to test during Binner Transformer training
Requires: List of integers (default: [5,10,20])
- Timeout in seconds for dropping duplicate rows in training data
What it does: Sets the timeout for duplicate row detection and removal in training data
Purpose: Prevents duplicate row operations from running indefinitely
Performance: Timeout increases proportionally with rows × columns growth
Timeout: Operation stops if time limit is exceeded
Requires: Integer value (default: 60)
- shift_check_text
What it does: Controls whether to perform shift detection on text features
Purpose: Enables distribution shift detection specifically for text columns
Text shift: Detects changes in text feature distributions between datasets
- Available options:
Enabled: Performs text shift detection
Disabled: Skips text shift detection
Requires: Boolean toggle (default: Disabled)
- use_rf_for_shift_if_have_lgbm
What it does: Controls whether to use Random Forest for shift detection when LightGBM is available
Purpose: Optimizes shift detection algorithm selection based on available models
Algorithm: Random Forest may be more robust for shift detection in some cases
- Available options:
Enabled: Uses Random Forest for shift detection when LightGBM is available
Disabled: Uses LightGBM for shift detection
Requires: Boolean toggle (default: Enabled)
- shift_key_features_varimp
What it does: Sets the variable importance threshold for key features in shift detection
Purpose: Controls which features are considered key for shift detection
Feature selection: Only features above this importance threshold are used for shift detection
Threshold: Features with variable importance below this value are excluded
Requires: Float value (default: 0.01)
- shift_check_reduced_features
What it does: Controls whether to perform shift detection on reduced feature sets
Purpose: Enables shift detection using dimensionality-reduced features
Reduction: Can improve shift detection performance and reduce computation
- Available options:
Enabled: Performs shift detection on reduced features
Disabled: Uses full feature set for shift detection
Requires: Boolean toggle (default: Enabled)
- shift_trees
What it does: Sets the number of trees to use for shift detection models
Purpose: Controls the complexity of models used for distribution shift detection
Detection: More trees can improve shift detection accuracy but increase computation
Balance: Typical values range from 50-200 trees depending on dataset size
Requires: Integer value (default: 100)
- shift_max_bin
What it does: Sets the maximum number of bins for shift detection models
Purpose: Controls the binning granularity for shift detection feature processing
Binning: Higher values provide finer granularity but increase computation
Detection: Affects the sensitivity of shift detection algorithms
Requires: Integer value (default: 256)
- shift_min_max_depth
What it does: Sets the minimum maximum depth for shift detection trees
Purpose: Controls the minimum complexity of trees used for shift detection
Depth: Ensures trees have sufficient depth to capture shift patterns
Minimum: Trees are at least this deep for shift detection
Requires: Integer value (default: 4)
- shift_max_max_depth
What it does: Sets the maximum maximum depth for shift detection trees
Purpose: Controls the maximum complexity of trees used for shift detection
Depth: Prevents trees from becoming too complex and overfitting
Maximum: Trees are at most this deep for shift detection
Requires: Integer value (default: 8)
- detect_features_distribution_shift_threshold_auc
What it does: Sets the AUC threshold for detecting feature distribution shift
Purpose: Determines the sensitivity of feature distribution shift detection
Detection: Features with AUC above this threshold are flagged as having distribution shift
Threshold: Higher values require stronger evidence of shift to trigger detection
Requires: Float value (default: 0.55)
- leakage_check_text
What it does: Controls whether to perform leakage detection on text features
Purpose: Enables data leakage detection specifically for text columns
Text leakage: Detects potential data leakage in text features between training and test sets
- Available options:
Enabled: Performs text leakage detection
Disabled: Skips text leakage detection
Requires: Boolean toggle (default: Enabled)
- leakage_key_features_varimp
What it does: Sets the variable importance threshold for key features in leakage detection
Purpose: Controls which features are considered key for leakage detection
Feature selection: Only features above this importance threshold are used for leakage detection
Threshold: Features with variable importance below this value are excluded
Requires: Float value (default: 0.001)
- leakage_check_reduced_features
What it does: Controls whether to perform leakage detection on reduced feature sets
Purpose: Enables leakage detection using dimensionality-reduced features
Reduction: Can improve leakage detection performance and reduce computation
- Available options:
Enabled: Performs leakage detection on reduced features
Disabled: Uses full feature set for leakage detection
Requires: Boolean toggle (default: Enabled)
- use_rf_for_leakage_if_have_lgbm
What it does: Controls whether to use Random Forest for leakage detection when LightGBM is available
Purpose: Optimizes leakage detection algorithm selection based on available models
Algorithm: Random Forest may be more robust for leakage detection in some cases
- Available options:
Enabled: Uses Random Forest for leakage detection when LightGBM is available
Disabled: Uses LightGBM for leakage detection
Requires: Boolean toggle (default: Enabled)
- leakage_trees
What it does: Sets the number of trees to use for leakage detection models
Purpose: Controls the complexity of models used for data leakage detection
Detection: More trees can improve leakage detection accuracy but increase computation
Balance: Typical values range from 50-200 trees depending on dataset size
Requires: Integer value (default: 100)
- leakage_max_bin
What it does: Sets the maximum number of bins for leakage detection models
Purpose: Controls the binning granularity for leakage detection feature processing
Binning: Higher values provide finer granularity but increase computation
Detection: Affects the sensitivity of leakage detection algorithms
Requires: Integer value (default: 256)
- leakage_min_max_depth
What it does: Sets the minimum maximum depth for leakage detection trees
Purpose: Controls the minimum complexity of trees used for leakage detection
Depth: Ensures trees have sufficient depth to capture leakage patterns
Minimum: Trees are at least this deep for leakage detection
Requires: Integer value (default: 6)
- leakage_max_max_depth
What it does: Sets the maximum maximum depth for leakage detection trees
Purpose: Controls the maximum complexity of trees used for leakage detection
Depth: Prevents trees from becoming too complex and overfitting
Maximum: Trees are at most this deep for leakage detection
Requires: Integer value (default: 8)
- leakage_train_test_split
What it does: Sets the train/test split ratio for leakage detection
Purpose: Controls how data is split for leakage detection model training
Split: Determines the proportion of data used for training vs. testing
Ratio: Fraction of data used for training (0.25 = 25% training, 75% testing)
Requires: Float value (default: 0.25)
- Whether to report basic system information on server startup
What it does: Controls whether to display basic system information when the server starts
Purpose: Provides system overview information during server initialization
Startup: Shows system configuration and resource information at startup
- Available options:
Enabled: Reports basic system information on startup
Disabled: Skips system information reporting on startup
Requires: Boolean toggle (default: Enabled)
- abs_tol_for_perfect_score
What it does: Sets the absolute tolerance threshold for considering a score as perfect
Purpose: Defines the numerical tolerance for perfect score detection
Perfect score: Scores within this tolerance of theoretical maximum are considered perfect
Tolerance: Very small value to account for numerical precision issues
Requires: Float value (default: 0.0001)
- data_ingest_timeout
What it does: Sets the timeout in seconds for data ingestion operations
Purpose: Prevents data ingestion from running indefinitely
Timeout: Data ingestion operations stop if time limit is exceeded
Default: 86400 seconds (24 hours) for large dataset ingestion
Requires: Integer value (default: 86400)
- debug_daimodel_level
What it does: Sets the debug level for DAI model operations
Purpose: Controls the verbosity of debug information for model operations
Debug levels: Higher values provide more detailed debug information
Levels: 0 = minimal, 1 = standard, 2 = detailed, 3 = comprehensive
Requires: Integer value (default: 0)
- Whether to show detailed predict information in logs
What it does: Controls whether to display detailed prediction information in log messages
Purpose: Provides comprehensive logging of prediction operations and results
Logging: Includes detailed information about prediction processes and outputs
- Available options:
Enabled: Shows detailed prediction information in logs
Disabled: Uses standard prediction logging
Requires: Boolean toggle (default: Enabled)
- Whether to show detailed fit information in logs
What it does: Controls whether to display detailed model fitting information in log messages
Purpose: Provides comprehensive logging of model training operations and results
Logging: Includes detailed information about model fitting processes and performance
- Available options:
Enabled: Shows detailed fit information in logs
Disabled: Uses standard fit logging
Requires: Boolean toggle (default: Enabled)
- show_inapplicable_models_preview
What it does: Controls whether to show inapplicable models in the preview interface
Purpose: Displays models that are not applicable to the current dataset or configuration
Preview: Helps users understand which models are excluded and why
- Available options:
Enabled: Shows inapplicable models in preview
Disabled: Hides inapplicable models from preview
Requires: Boolean toggle (default: Disabled)
- show_inapplicable_transformers_preview
What it does: Controls whether to show inapplicable transformers in the preview interface
Purpose: Displays transformers that are not applicable to the current dataset or configuration
Preview: Helps users understand which transformers are excluded and why
- Available options:
Enabled: Shows inapplicable transformers in preview
Disabled: Hides inapplicable transformers from preview
Requires: Boolean toggle (default: Disabled)
- show_warnings_preview
What it does: Controls whether to show warning messages in the preview interface
Purpose: Displays warning information about potential issues or recommendations
Preview: Helps users identify potential problems before starting experiments
- Available options:
Enabled: Shows warnings in preview
Disabled: Hides warnings from preview
Requires: Boolean toggle (default: Disabled)
- show_warnings_preview_unused_map_features
What it does: Controls whether to show warnings about unused map features in preview
Purpose: Displays warnings when map features are defined but not used
Map features: Helps users identify unused feature mapping configurations
- Available options:
Enabled: Shows unused map feature warnings
Disabled: Hides unused map feature warnings
Requires: Boolean toggle (default: Enabled)
- max_cols_show_unused_features
What it does: Sets the maximum number of columns to show for unused features warnings
Purpose: Limits the verbosity of unused features warning messages
Warnings: Prevents warning messages from becoming too long with many unused features
Limit: Only shows unused features up to this maximum number in warnings
Requires: Integer value (default: 1000)
- max_cols_show_feature_transformer_mapping
What it does: Sets the maximum number of columns to show for feature transformer mapping
Purpose: Limits the verbosity of feature transformer mapping display
Mapping: Prevents mapping displays from becoming too long with many features
Limit: Only shows feature mappings up to this maximum number
Requires: Integer value (default: 1000)
- warning_unused_feature_show_max
What it does: Sets the maximum number of unused features to show in warning messages
Purpose: Limits the number of unused features displayed in warning messages
Warnings: Prevents warning messages from becoming too long with many unused features
Limit: Only shows up to this many unused features in warnings
Requires: Integer value (default: 3)
- interaction_finder_max_rows_x_cols
What it does: Sets the maximum rows × columns threshold for interaction finder operations
Purpose: Limits the scope of interaction detection to prevent excessive computation
Performance: Interaction finding can be computationally expensive on large datasets
Threshold: Datasets with rows × columns above this value may skip interaction finding
Requires: Integer value (default: 200000)
- interaction_finder_corr_threshold
What it does: Sets the correlation threshold for interaction finder detection
Purpose: Controls the sensitivity of interaction detection based on feature correlations
Detection: Higher thresholds require stronger correlations to detect interactions
Threshold: Features with correlations above this value may be considered for interactions
Requires: Float value (default: 0.95)
- Minimum number of bootstrap samples
What it does: Sets the minimum number of bootstrap samples for statistical operations
Purpose: Ensures sufficient bootstrap samples for reliable statistical estimates
Bootstrap: More samples provide more robust statistical estimates
Minimum: Guarantees at least this many bootstrap samples are used
Requires: Integer value (default: 1)
- Maximum number of bootstrap samples
What it does: Sets the maximum number of bootstrap samples for statistical operations
Purpose: Limits the number of bootstrap samples to prevent excessive computation
Bootstrap: Prevents bootstrap operations from running too long
Maximum: Bootstrap operations use at most this many samples
Requires: Integer value (default: 100)
- Minimum fraction of rows to use for bootstrap samples
What it does: Sets the minimum fraction of rows to include in bootstrap samples
Purpose: Ensures bootstrap samples contain sufficient data for reliable estimates
Sampling: Higher fractions provide more robust bootstrap estimates
Minimum: Bootstrap samples contain at least this fraction of original rows
Requires: Float value (default: 1)
- Maximum fraction of rows to use for bootstrap samples
What it does: Sets the maximum fraction of rows to include in bootstrap samples
Purpose: Limits the size of bootstrap samples to control computation time
Sampling: Prevents bootstrap samples from becoming too large
Maximum: Bootstrap samples contain at most this fraction of original rows
Requires: Float value (default: 10)
- Seed to use for final model bootstrap sampling
What it does: Sets the random seed for final model bootstrap sampling operations
Purpose: Ensures reproducible bootstrap sampling for final model evaluation
Reproducibility: Same seed produces identical bootstrap samples across runs
Auto mode (-1): Uses random seed for each bootstrap operation
Custom value: Uses specified seed for reproducible bootstrap sampling
Requires: Integer value (default: -1)
- benford_mad_threshold_int
What it does: Sets the Mean Absolute Deviation (MAD) threshold for Benford’s Law validation on integer data
Purpose: Controls the sensitivity of Benford’s Law compliance detection for integer features
Benford’s Law: Validates whether integer data follows expected digit distribution patterns
Threshold: Integer features with MAD above this value may violate Benford’s Law
Requires: Float value (default: 0.03)
- benford_mad_threshold_real
What it does: Sets the Mean Absolute Deviation (MAD) threshold for Benford’s Law validation on real number data
Purpose: Controls the sensitivity of Benford’s Law compliance detection for real number features
Benford’s Law: Validates whether real number data follows expected digit distribution patterns
Threshold: Real number features with MAD above this value may violate Benford’s Law
Requires: Float value (default: 0.1)
- Use tuning-evolution search result for final model transformer
What it does: Controls whether to use tuning-evolution search results for final model transformer selection
Purpose: Applies evolutionary search results to final model transformer configuration
Evolution: Uses genetic algorithm results to optimize final model transformer settings
- Available options:
Enabled: Uses tuning-evolution results for final model
Disabled: Uses standard transformer selection for final model
Requires: Boolean toggle (default: Enabled)
- Factor of standard deviation of bootstrap scores by which to accept new model in genetic algorithm
What it does: Sets the factor for accepting new models in genetic algorithm based on bootstrap score variation
Purpose: Controls model acceptance threshold considering bootstrap score uncertainty
GA Selection: Helps balance exploration vs. exploitation in genetic algorithm
Factor: New models are accepted if score improvement exceeds this factor times bootstrap stddev
Requires: Float value (default: 0.01)
- Minimum number of bootstrap samples that are required to limit accepting new model
What it does: Sets the minimum bootstrap samples required before applying acceptance limitations
Purpose: Ensures sufficient bootstrap samples before using score-based acceptance criteria
Bootstrap: Provides reliable score estimates before applying acceptance thresholds
Minimum: At least this many bootstrap samples are required for acceptance limitations
Requires: Integer value (default: 10)
- features_allowed_by_interpretability
What it does: Sets the maximum number of features allowed for each interpretability setting
Purpose: Controls feature complexity limits based on interpretability requirements
Interpretability: Higher interpretability settings allow fewer features for simplicity
Configuration: Dictionary mapping interpretability levels to maximum feature counts
Default: {1: 10000000, 2: 10000, 3: 1000, 4: 500, 5: 300, 6: 200, 7: 150, 8: 100, 9: 80, 10: 50, 11: 50, 12: 50, 13: 50}
Requires: Dictionary (default: {1: 10000000, 2: 10000, 3: 1000, 4: 500, 5: 300, 6: 200, 7: 150, 8: 100, 9: 80, 10: 50, 11: 50, 12: 50, 13: 50})
- nfeatures_max_threshold
What it does: Sets the maximum threshold for the number of features in models
Purpose: Limits the maximum number of features to prevent overfitting and improve interpretability
Features: Prevents models from using too many features
Threshold: Models are limited to at most this many features
Requires: Integer value (default: 200)
- rdelta_percent_score_penalty_per_feature_by_interpretability
What it does: Sets the score penalty per feature based on interpretability setting
Purpose: Applies penalties to model scores based on feature count and interpretability
Penalty: Higher interpretability settings impose larger penalties for additional features
Configuration: Dictionary mapping interpretability levels to penalty percentages
Default: {1: 0.0, 2: 0.1, 3: 1.0, 4: 2.0, 5: 5.0, 6: 10.0, 7: 20.0, 8: 30.0, 9: 50.0, 10: 100.0, 11: 100.0, 12: 100.0, 13: 100.0}
Requires: Dictionary (default: {1: 0.0, 2: 0.1, 3: 1.0, 4: 2.0, 5: 5.0, 6: 10.0, 7: 20.0, 8: 30.0, 9: 50.0, 10: 100.0, 11: 100.0, 12: 100.0, 13: 100.0})
- drop_low_meta_weights
What it does: Controls whether to drop meta learner weights that are too low
Purpose: Removes meta learner weights below threshold to improve ensemble quality
Meta learning: Low weights may indicate poor performing base models
- Available options:
Enabled: Drops low meta weights
Disabled: Keeps all meta weights regardless of value
Requires: Boolean toggle (default: Enabled)
- meta_weight_allowed_by_interpretability
What it does: Sets the minimum allowed meta learner weights based on interpretability setting
Purpose: Controls meta learner weight thresholds based on interpretability requirements
Interpretability: Higher interpretability settings require higher minimum meta weights
Configuration: Dictionary mapping interpretability levels to minimum weight thresholds
Default: {1: 1E-7, 2: 1E-5, 3: 1E-4, 4: 1E-3, 5: 1E-2, 6: 0.03, 7: 0.05, 8: 0.08, 9: 0.10, 10: 0.15, 11: 0.15, 12: 0.15, 13: 0.15}
Requires: Dictionary (default: {1: 1E-7, 2: 1E-5, 3: 1E-4, 4: 1E-3, 5: 1E-2, 6: 0.03, 7: 0.05, 8: 0.08, 9: 0.10, 10: 0.15, 11: 0.15, 12: 0.15, 13: 0.15})
- Min. weight of meta learner for reference models during ensembling
What it does: Sets the minimum weight required for reference models in ensemble creation
Purpose: Ensures reference models have sufficient weight to be included in ensembles
Ensemble: Reference models must exceed this weight threshold to be kept
Weight: If set to 1.0, reference model must be clear winner to be kept; 0.0 never drops reference models
Requires: Float value (default: 1)
- feature_cost_mean_interp_for_penalty
What it does: Sets the mean interpretability value for feature cost penalty calculation
Purpose: Provides baseline interpretability level for feature cost penalty computation
Penalty: Feature costs are calculated relative to this mean interpretability value
Baseline: Used as reference point for feature complexity penalty calculations
Requires: Float value (default: 5)
- features_cost_per_interp
What it does: Sets the cost per interpretability unit for feature complexity penalties
Purpose: Defines the penalty rate for feature complexity based on interpretability
Penalty: Higher values impose larger penalties for complex features
Rate: Cost increases by this amount for each interpretability unit
Requires: Float value (default: 0.25)
- varimp_threshold_shift_report
What it does: Sets the variable importance threshold for shift detection reporting
Purpose: Controls which features are reported in shift detection results
Reporting: Only features with importance above this threshold are included in shift reports
Threshold: Features with variable importance below this value are excluded from reports
Requires: Float value (default: 0.3)
- apply_featuregene_limits_after_tuning
What it does: Controls whether to apply feature and gene limits after hyperparameter tuning
Purpose: Applies feature and gene complexity limits after tuning is complete
Tuning: Ensures final models respect complexity limits regardless of tuning results
- Available options:
Enabled: Applies limits after tuning
Disabled: Does not apply limits after tuning
Requires: Boolean toggle (default: Enabled)
- remove_scored_0gain_genes_in_postprocessing_above_interpretability
What it does: Controls whether to remove genes with zero gain scores above certain interpretability levels
Purpose: Removes ineffective genes from models above specified interpretability settings
Postprocessing: Cleans up models by removing genes that provide no benefit
Threshold: Genes with zero gain are removed above this interpretability level
Requires: Integer value (default: 13)
- remove_scored_0gain_genes_in_postprocessing_above_interpretability_final_population
What it does: Controls whether to remove zero gain genes from final population above interpretability threshold
Purpose: Removes ineffective genes from final model population above specified interpretability
Final population: Ensures final models don’t contain genes that provide no benefit
Threshold: Zero gain genes are removed above this interpretability level in final population
Requires: Integer value (default: 2)
- remove_scored_by_threshold_genes_in_postprocessing_above_interpretability_final_population
What it does: Controls whether to remove genes below score threshold from final population above interpretability
Purpose: Removes low-performing genes from final model population above specified interpretability
Final population: Ensures final models contain only high-performing genes
Threshold: Genes below score threshold are removed above this interpretability level
Requires: Integer value (default: 7)
- Whether to show full pipeline details
What it does: Controls whether to display comprehensive pipeline details in logs and reports
Purpose: Provides detailed information about the complete machine learning pipeline
Details: Includes information about all pipeline components and their configurations
- Available options:
Enabled: Shows full pipeline details
Disabled: Shows simplified pipeline information
Requires: Boolean toggle (default: Disabled)
- Number of features to show when logging size of fitted transformers
What it does: Sets the maximum number of features to display when logging transformer sizes
Purpose: Limits the verbosity of transformer size logging information
Logging: Prevents log messages from becoming too long with many features
Limit: Only shows up to this many features in transformer size logs
Requires: Integer value (default: 10)
- fs_data_vary_for_interpretability
What it does: Sets the interpretability threshold for varying data in feature selection
Purpose: Controls when to vary data samples for feature selection based on interpretability
Feature selection: Higher interpretability settings may use different data sampling strategies
Threshold: Data variation is applied for feature selection above this interpretability level
Requires: Integer value (default: 7)
- Fraction of data to use for another data slice for FS
What it does: Sets the fraction of data to use for additional data slices in feature selection
Purpose: Controls the amount of data used for additional feature selection validation
Data slice: Provides additional validation data for feature selection decisions
Fraction: Proportion of data used for additional feature selection slice
Requires: Float value (default: 0.5)
- Whether to round-up individuals to ensure all GPUs used
What it does: Controls whether to round up individual count to fully utilize available GPUs
Purpose: Optimizes GPU utilization by adjusting individual count to match GPU availability
GPU utilization: Ensures all available GPUs are used efficiently
Note: May not always be optimal if many GPUs are available in multi-user environments
- Available options:
Enabled: Rounds up individuals to use all GPUs
Disabled: Uses specified individual count regardless of GPU availability
Requires: Boolean toggle (default: Enabled)
- Whether to require Graphviz package at startup
What it does: Controls whether to require Graphviz package installation at system startup
Purpose: Ensures Graphviz is available for pipeline visualization and graph generation
Visualization: Graphviz is required for generating pipeline visualizations and decision trees
Startup: System checks for Graphviz availability during initialization
- Available options:
Enabled: Requires Graphviz at startup
Disabled: Does not require Graphviz at startup
Requires: Boolean toggle (default: Enabled)
Note: This represents the comprehensive documentation of 200+ configuration options available in the Extra tab. Each setting includes detailed tooltips accessible via the “i” icon in the interface, providing specific guidance for advanced users and support scenarios. All configurations follow the same detailed documentation pattern with clear descriptions, purposes, requirements, and default values.
- fast_approx_max_num_trees_ever
What it does: Sets the maximum number of trees for fast approximation algorithms
Purpose: Limits the number of trees used in fast approximation methods to prevent excessive computation
Performance: Fast approximation methods use fewer trees for quicker results
Auto mode (-1): Uses automatic tree count determination
Custom value: Limits fast approximation to specified number of trees
Requires: Integer value (default: -1)
- max_absolute_feature_expansion
What it does: Sets the maximum absolute number of features that can be created through feature expansion
Purpose: Prevents excessive feature creation that could lead to memory or performance issues
Feature expansion: Controls the scope of automatic feature generation
Limit: Feature expansion stops when this threshold is reached
Requires: Integer value (default: 1000)
- model_class_name_for_shift
What it does: Specifies the model class to use for shift detection operations
Purpose: Controls which model type is used for distribution shift detection
Auto mode: Automatically selects the most appropriate model class
Custom value: Forces use of specified model class for shift detection
Requires: String selection (default: auto)
- model_class_name_for_leakage
What it does: Specifies the model class to use for leakage detection operations
Purpose: Controls which model type is used for data leakage detection
Auto mode: Automatically selects the most appropriate model class
Custom value: Forces use of specified model class for leakage detection
Requires: String selection (default: auto)
- tensorflow_num_classes_switch_but_keep_lightgbm
What it does: Sets the class count threshold for switching to TensorFlow while keeping LightGBM
Purpose: Determines when to use TensorFlow for multi-class problems while maintaining LightGBM
Class count: Problems with more classes than this threshold use TensorFlow
Hybrid approach: Uses both TensorFlow and LightGBM for optimal performance
Requires: Integer value (default: 15)
- Class count above which do not use TextLin Transformer
What it does: Sets the maximum class count for using TextLin Transformer
Purpose: Limits TextLin Transformer usage to problems with manageable class counts
Performance: TextLin Transformer may not be efficient for high-class-count problems
Threshold: TextLin Transformer is disabled for problems with more than this many classes
Requires: Integer value (default: 5)
- text_gene_dim_reduction_choices
What it does: Sets the dimensionality reduction options for text gene processing
Purpose: Defines available dimensionality reduction methods for text features
Reduction: Controls how text features are reduced for efficient processing
Options: List of dimensionality reduction values to choose from
Requires: List of integers (default: [50])
- text_gene_max_ngram
What it does: Sets the maximum n-gram size for text gene processing
Purpose: Controls the maximum n-gram length used in text feature extraction
N-grams: Higher values capture longer text patterns but increase computation
Options: List of maximum n-gram sizes to test
Requires: List of integers (default: [1,2,3])
- number_of_texts_to_cache_in_bert_transformer
What it does: Sets the number of text samples to cache in BERT transformer
Purpose: Controls BERT transformer caching for improved performance
Caching: More cached texts improve performance but use more memory
Auto mode (-1): Uses automatic caching based on available memory
Custom value: Limits BERT transformer caching to specified number of texts
Requires: Integer value (default: -1)
- gbm_early_stopping_rounds_min
What it does: Sets the minimum number of early stopping rounds for GBM models
Purpose: Ensures minimum training rounds before early stopping can occur
Early stopping: Prevents premature stopping that could hurt model performance
Minimum: GBM models train for at least this many rounds before early stopping
Requires: Integer value (default: 1)
- gbm_early_stopping_rounds_max
What it does: Sets the maximum number of early stopping rounds for GBM models
Purpose: Limits the maximum training rounds for GBM early stopping
Early stopping: Prevents excessive training rounds in early stopping scenarios
Maximum: GBM models stop training after this many rounds maximum
Requires: Integer value (default: 10000000000)
- max_num_varimp_to_log
What it does: Sets the maximum number of variable importance values to log
Purpose: Limits the verbosity of variable importance logging information
Logging: Prevents log messages from becoming too long with many variables
Limit: Only logs variable importance for up to this many variables
Requires: Integer value (default: 10)
- max_num_varimp_shift_to_log
What it does: Sets the maximum number of variable importance values to log for shift detection
Purpose: Limits the verbosity of shift detection variable importance logging
Logging: Prevents log messages from becoming too long with many shift variables
Limit: Only logs shift variable importance for up to this many variables
Requires: Integer value (default: 10)
- can_skip_final_upper_layer_failures
What it does: Controls whether to skip final upper layer failures in model training
Purpose: Allows models to continue training even if upper layers fail
Resilience: Improves model training robustness by handling layer failures
- Available options:
Enabled: Skips final upper layer failures
Disabled: Stops training on upper layer failures
Requires: Boolean toggle (default: Enabled)
- dump_modelparams_every_scored_indiv_feature_count
What it does: Sets the frequency for dumping model parameters based on scored individual feature count
Purpose: Controls how often model parameters are saved during feature scoring
Dumping: More frequent dumping provides better checkpointing but uses more disk space
Frequency: Model parameters are dumped every N scored individual features
Requires: Integer value (default: 3)
- dump_modelparams_every_scored_indiv_mutation_count
What it does: Sets the frequency for dumping model parameters based on scored individual mutation count
Purpose: Controls how often model parameters are saved during mutation operations
Dumping: More frequent dumping provides better checkpointing but uses more disk space
Frequency: Model parameters are dumped every N scored individual mutations
Requires: Integer value (default: 3)
- dump_modelparams_separate_files
What it does: Controls whether to dump model parameters to separate files
Purpose: Organizes model parameter dumps into individual files for better management
File organization: Separate files make it easier to track parameter changes over time
- Available options:
Enabled: Dumps model parameters to separate files
Disabled: Dumps all model parameters to a single file
Requires: Boolean toggle (default: Disabled)
- oauth2_client_tokens_enabled
What it does: Controls whether OAuth2 client tokens are enabled for authentication
Purpose: Enables OAuth2 client token-based authentication for secure access
Security: Provides secure authentication mechanism for client applications
- Available options:
Enabled: Enables OAuth2 client token authentication
Disabled: Uses standard authentication methods
Requires: Boolean toggle (default: Disabled)
- Maximum number of threads/forks for autoreport PDP. -1 means auto.
What it does: Sets the maximum number of threads/forks for autoreport Partial Dependence Plot generation
Purpose: Controls parallel processing for autoreport PDP generation
Performance: More threads can speed up PDP generation but use more resources
Auto mode (-1): Uses automatic thread allocation based on system resources
Custom value: Limits PDP generation to specified number of threads
Requires: Integer value (default: -1)
- Maximum number of column for Autoviz
What it does: Sets the maximum number of columns to include in Autoviz visualizations
Purpose: Limits the scope of automatic visualization generation to prevent performance issues
Visualization: Large column counts can significantly slow down Autoviz generation
Threshold: Autoviz includes at most this many columns in visualizations
Requires: Integer value (default: 50)
- Maximum number of rows in aggregated frame
What it does: Sets the maximum number of rows in aggregated data frames
Purpose: Limits the size of aggregated frames to prevent memory and performance issues
Aggregation: Large row counts can cause memory problems during aggregation operations
Threshold: Aggregated frames are limited to this maximum number of rows
Requires: Integer value (default: 500)
- Autoviz Use Recommended Transformations
What it does: Controls whether Autoviz uses recommended transformations for data visualization
Purpose: Applies recommended data transformations to improve visualization quality
Transformations: Recommended transformations can enhance data visualization effectiveness
- Available options:
Enabled: Uses recommended transformations in Autoviz
Disabled: Uses raw data without recommended transformations
Requires: Boolean toggle (default: Enabled)
- enable_custom_recipes_from_url
What it does: Enables loading custom recipes from URL sources
Purpose: Allows users to load custom recipes from remote URL locations
Custom recipes: Extends functionality by allowing external recipe sources
- Available options:
Enabled: Allows loading custom recipes from URLs
Disabled: Disables URL-based custom recipe loading
Requires: Boolean toggle (default: Enabled)
- enable_custom_recipes_from_zip
What it does: Enables loading custom recipes from ZIP file sources
Purpose: Allows users to load custom recipes from ZIP archives
Custom recipes: Extends functionality by allowing ZIP-based recipe sources
- Available options:
Enabled: Allows loading custom recipes from ZIP files
Disabled: Disables ZIP-based custom recipe loading
Requires: Boolean toggle (default: Enabled)
- enable_recreate_custom_recipes_env
What it does: Enables recreation of custom recipe environments
Purpose: Allows recreation of custom recipe execution environments for consistency
Environment: Ensures custom recipes run in clean, consistent environments
- Available options:
Enabled: Recreates custom recipe environments
Disabled: Reuses existing custom recipe environments
Requires: Boolean toggle (default: Enabled)
- include_custom_recipes_by_default
What it does: Controls whether custom recipes are included by default in experiments
Purpose: Determines if custom recipes are automatically included in new experiments
Default inclusion: Custom recipes are included automatically if enabled
- Available options:
Enabled: Includes custom recipes by default
Disabled: Requires manual inclusion of custom recipes
Requires: Boolean toggle (default: Disabled)
- h2o_recipes_url
What it does: Specifies the URL for H2O recipes repository
Purpose: Defines the source URL for downloading H2O recipes
Repository: URL pointing to H2O recipes repository or custom recipe source
Default: None (uses built-in recipes)
Requires: String URL (default: None)
- h2o_recipes_ip
What it does: Specifies the IP address for H2O recipes server
Purpose: Defines the server IP address for H2O recipes access
Server: IP address of the server hosting H2O recipes
Default: None (uses default server)
Requires: String IP address (default: None)
- h2o_recipes_nthreads
What it does: Sets the number of threads for H2O recipes processing
Purpose: Controls parallel processing for H2O recipe operations
Performance: More threads can improve recipe processing speed
Threading: Number of threads allocated for H2O recipe operations
Requires: Integer value (default: 8)
- h2o_recipes_log_level
What it does: Sets the log level for H2O recipes operations
Purpose: Controls the verbosity of logging for H2O recipe processing
Logging: Higher levels provide more detailed logging information
Default: None (uses system default log level)
Requires: String log level (default: None)
- h2o_recipes_max_mem_size
What it does: Sets the maximum memory size for H2O recipes operations
Purpose: Limits memory usage for H2O recipe processing to prevent system overload
Memory: Maximum memory allocation for H2O recipe operations
Default: None (uses system default memory limits)
Requires: String memory size (default: None)
- h2o_recipes_min_mem_size
What it does: Sets the minimum memory size for H2O recipes operations
Purpose: Ensures minimum memory allocation for H2O recipe processing
Memory: Minimum memory allocation for H2O recipe operations
Default: None (uses system default memory limits)
Requires: String memory size (default: None)
- h2o_recipes_kwargs
What it does: Sets additional keyword arguments for H2O recipes configuration
Purpose: Provides additional configuration parameters for H2O recipe operations
Configuration: Dictionary of additional parameters for H2O recipes
Extensibility: Allows custom configuration beyond standard parameters
Requires: Dictionary (default: {})
- h2o_recipes_start_trials
What it does: Sets the number of start trials for H2O recipes initialization
Purpose: Controls the number of initialization trials for H2O recipe startup
Initialization: More trials can improve startup reliability but take longer
Trials: Number of attempts to initialize H2O recipes
Requires: Integer value (default: 5)
- h2o_recipes_start_sleep0
What it does: Sets the initial sleep duration for H2O recipes startup
Purpose: Controls the initial delay before starting H2O recipe initialization
Startup: Initial sleep duration in seconds before first startup attempt
Delay: Helps ensure system readiness before recipe initialization
Requires: Integer value (default: 1)
- h2o_recipes_start_sleep
What it does: Sets the sleep duration between H2O recipes startup attempts
Purpose: Controls the delay between consecutive startup attempts for H2O recipes
Retry: Sleep duration in seconds between startup retry attempts
Delay: Helps prevent rapid retry attempts that could overwhelm the system
Requires: Integer value (default: 5)
- custom_recipes_lock_to_git_repo
What it does: Controls whether custom recipes are locked to a specific Git repository
Purpose: Ensures custom recipes are loaded only from the specified Git repository
Security: Prevents loading custom recipes from unauthorized sources
- Available options:
Enabled: Locks custom recipes to specified Git repository
Disabled: Allows custom recipes from any source
Requires: Boolean toggle (default: Disabled)
- custom_recipes_git_repo
What it does: Specifies the Git repository URL for custom recipes
Purpose: Defines the source Git repository for custom recipe downloads
Repository: URL of the Git repository containing custom recipes
Default: Official H2O.ai driverlessai-recipes repository
Requires: String URL (default: Driverless AI recipes)
- custom_recipes_git_branch
What it does: Specifies the Git branch for custom recipes
Purpose: Defines which branch to use when downloading custom recipes from Git
Branch: Git branch name for custom recipe source
Default: None (uses default branch)
Requires: String branch name (default: None)
- basenames of files to exclude from repo download
What it does: Specifies the basenames of files to exclude from repository downloads
Purpose: Allows exclusion of specific files during custom recipe repository downloads
Exclusion: List of file basenames to skip during download operations
Filtering: Helps exclude unnecessary or problematic files from downloads
Requires: List of strings (default: [])
- Allow use of deprecated get_global_directory() method from custom recipes for backward compatibility of recipes created before 1.9.0. Disable to force separation of custom recipes per user (in which case user_dir() should be used instead).
What it does: Controls backward compatibility for deprecated directory methods in custom recipes
Purpose: Maintains compatibility with custom recipes created before version 1.9.0
Compatibility: Allows use of deprecated get_global_directory() method
Separation: When disabled, forces per-user recipe separation using user_dir()
- Available options:
Enabled: Allows deprecated directory methods for backward compatibility
Disabled: Forces modern per-user recipe separation
Requires: Boolean toggle (default: Enabled)
- enable_custom_transformers
What it does: Enables custom transformer functionality
Purpose: Allows users to create and use custom data transformers
Custom transformers: Extends functionality by allowing user-defined transformers
- Available options:
Enabled: Enables custom transformer functionality
Disabled: Disables custom transformer functionality
Requires: Boolean toggle (default: Enabled)
- enable_custom_pretransformers
What it does: Enables custom pretransformer functionality
Purpose: Allows users to create and use custom pretransformers
Custom pretransformers: Extends functionality by allowing user-defined pretransformers
- Available options:
Enabled: Enables custom pretransformer functionality
Disabled: Disables custom pretransformer functionality
Requires: Boolean toggle (default: Enabled)
- enable_custom_models
What it does: Enables custom model functionality
Purpose: Allows users to create and use custom machine learning models
Custom models: Extends functionality by allowing user-defined models
- Available options:
Enabled: Enables custom model functionality
Disabled: Disables custom model functionality
Requires: Boolean toggle (default: Enabled)
- enable_custom_scorers
What it does: Enables custom scorer functionality
Purpose: Allows users to create and use custom scoring metrics
Custom scorers: Extends functionality by allowing user-defined scoring metrics
- Available options:
Enabled: Enables custom scorer functionality
Disabled: Disables custom scorer functionality
Requires: Boolean toggle (default: Enabled)
- enable_custom_datas
What it does: Enables custom data source functionality
Purpose: Allows users to create and use custom data sources
Custom datas: Extends functionality by allowing user-defined data sources
- Available options:
Enabled: Enables custom data source functionality
Disabled: Disables custom data source functionality
Requires: Boolean toggle (default: Enabled)
- enable_custom_explainers
What it does: Enables custom explainer functionality
Purpose: Allows users to create and use custom model explainers
Custom explainers: Extends functionality by allowing user-defined explainers
- Available options:
Enabled: Enables custom explainer functionality
Disabled: Disables custom explainer functionality
Requires: Boolean toggle (default: Enabled)
- enable_custom_individuals
What it does: Enables custom individual functionality
Purpose: Allows users to create and use custom individual configurations
Custom individuals: Extends functionality by allowing user-defined individual configurations
- Available options:
Enabled: Enables custom individual functionality
Disabled: Disables custom individual functionality
Requires: Boolean toggle (default: Enabled)
- enable_connectors_recipes
What it does: Enables connector recipe functionality
Purpose: Allows users to create and use connector-based recipes
Connector recipes: Extends functionality by allowing connector-based recipe configurations
- Available options:
Enabled: Enables connector recipe functionality
Disabled: Disables connector recipe functionality
Requires: Boolean toggle (default: Enabled)
- Base directory for recipes within data directory.
What it does: Sets the base directory for recipes within the data directory
Purpose: Defines the location where recipes are stored within the data directory structure
Directory structure: Organizes recipes within the data directory hierarchy
Base path: Root directory for recipe storage within data directory
Requires: String path (default: contrib)
- contrib_env_relative_directory
What it does: Sets the relative directory path for contribution environment
Purpose: Defines the relative path for contribution environment within the base directory
Environment: Location for contribution environment configuration and files
Relative path: Path relative to the base contribution directory
Requires: String path (default: contrib/env)
- pip_install_overall_retries
What it does: Sets the overall number of retries for pip install operations
Purpose: Controls the total number of retry attempts for pip package installation
Reliability: More retries can improve installation success rate
Retries: Total number of retry attempts for pip install operations
Requires: Integer value (default: 2)
- pip_install_verbosity
What it does: Sets the verbosity level for pip install operations
Purpose: Controls the amount of output information during pip package installation
Logging: Higher verbosity provides more detailed installation information
Level: Verbosity level for pip install operation output
Requires: Integer value (default: 2)
- pip_install_timeout
What it does: Sets the timeout duration for pip install operations
Purpose: Prevents pip install operations from running indefinitely
Timeout: Maximum time in seconds allowed for pip install operations
Prevention: Helps prevent hanging installation processes
Requires: Integer value (default: 15)
- pip_install_retries
What it does: Sets the number of retries for individual pip install operations
Purpose: Controls retry attempts for individual pip package installation failures
Retry: Number of retry attempts for each individual pip install operation
Reliability: More retries can improve individual package installation success
Requires: Integer value (default: 5)
- pip_install_use_constraint
What it does: Controls whether to use constraint files for pip install operations
Purpose: Enables use of constraint files to ensure consistent package versions
Constraints: Constraint files help maintain consistent dependency versions
- Available options:
Enabled: Uses constraint files for pip install
Disabled: Does not use constraint files for pip install
Requires: Boolean toggle (default: Enabled)
- pip_install_options
What it does: Sets additional options for pip install operations
Purpose: Provides additional command-line options for pip package installation
Options: List of additional pip install command-line options
Customization: Allows custom pip install behavior beyond default settings
Requires: List of strings (default: [])
- enable_basic_acceptance_tests
What it does: Enables basic acceptance tests for custom recipes
Purpose: Provides basic testing functionality for custom recipe validation
Testing: Basic acceptance tests help validate custom recipe functionality
- Available options:
Enabled: Enables basic acceptance tests
Disabled: Disables basic acceptance tests
Requires: Boolean toggle (default: Enabled)
- enable_acceptance_tests
What it does: Enables comprehensive acceptance tests for custom recipes
Purpose: Provides comprehensive testing functionality for custom recipe validation
Testing: Comprehensive acceptance tests thoroughly validate custom recipe functionality
- Available options:
Enabled: Enables comprehensive acceptance tests
Disabled: Disables comprehensive acceptance tests
Requires: Boolean toggle (default: Enabled)
- skip_disabled_recipes
What it does: Controls whether to skip disabled recipes during processing
Purpose: Allows skipping of recipes that have been disabled or marked as unavailable
Processing: Disabled recipes are excluded from processing when enabled
- Available options:
Enabled: Skips disabled recipes
Disabled: Processes all recipes including disabled ones
Requires: Boolean toggle (default: Disabled)
- contrib_reload_and_recheck_server_start
What it does: Controls whether to reload and recheck contributions during server startup
Purpose: Ensures contributions are properly loaded and validated during server initialization
Startup: Reloads and rechecks contributions to ensure they are current and valid
- Available options:
Enabled: Reloads and rechecks contributions on server start
Disabled: Uses cached contribution information on server start
Requires: Boolean toggle (default: Enabled)
- contrib_install_packages_server_start
What it does: Controls whether to install packages for contributions during server startup
Purpose: Ensures required packages for contributions are installed during server initialization
Dependencies: Installs necessary packages for contribution functionality
- Available options:
Enabled: Installs contribution packages on server start
Disabled: Skips contribution package installation on server start
Requires: Boolean toggle (default: Enabled)
- contrib_reload_and_recheck_worker_tasks
What it does: Controls whether to reload and recheck contributions during worker task execution
Purpose: Ensures contributions are properly loaded and validated during worker task processing
Worker tasks: Reloads and rechecks contributions to ensure they are current for worker tasks
- Available options:
Enabled: Reloads and rechecks contributions for worker tasks
Disabled: Uses cached contribution information for worker tasks
Requires: Boolean toggle (default: Disabled)
- num_rows_acceptance_test_custom_transformer
What it does: Sets the number of rows to use for acceptance testing of custom transformers
Purpose: Controls the dataset size used for testing custom transformer functionality
Testing: More rows provide more comprehensive testing but take longer
Rows: Number of data rows used for custom transformer acceptance testing
Requires: Integer value (default: 200)
- num_rows_acceptance_test_custom_model
What it does: Sets the number of rows to use for acceptance testing of custom models
Purpose: Controls the dataset size used for testing custom model functionality
Testing: More rows provide more comprehensive testing but take longer
Rows: Number of data rows used for custom model acceptance testing
Requires: Integer value (default: 100)
- enable_mapr_multi_user_mode
What it does: Enables MapR multi-user mode for distributed processing
Purpose: Allows multiple users to access MapR distributed file system simultaneously
Multi-user: Enables concurrent access to MapR resources by multiple users
- Available options:
Enabled: Enables MapR multi-user mode
Disabled: Uses single-user MapR mode
Requires: Boolean toggle (default: Disabled)
- mli_lime_method
What it does: Specifies which LIME method to use for the creation of surrogate models
Purpose: Choose the approach for generating local model explanations using LIME techniques
Available options: auto (automatically selects the most suitable method), k_lime (K-LIME), lime_sup (LIME-SUP)
Description: Selects the LIME variant (including classic LIME, K-LIME, or LIME-SUP) to be used for surrogate model interpretability
Requires: String value (default: auto)
- Use original features for surrogate models.
What it does: Controls whether to use original features for surrogate model training
Purpose: Determines feature set used for creating surrogate models in interpretability
Surrogate models: Simpler models used to explain complex model behavior
- Available options:
Enabled: Uses original features for surrogate models
Disabled: Uses transformed features for surrogate models
Requires: Boolean toggle (default: Enabled)
- Use original features for time series based surrogate models.
What it does: Controls whether to use original features for time series surrogate models
Purpose: Determines feature set used for time series surrogate model training
Time series: Surrogate models specifically designed for time series data
- Available options:
Enabled: Uses original features for time series surrogate models
Disabled: Uses transformed features for time series surrogate models
Requires: Boolean toggle (default: Disabled)
- Sample all explainers.
What it does: Controls whether to sample all explainers for model interpretability
Purpose: Determines if all available explainers are sampled during interpretability analysis
Explainers: Different methods for explaining model predictions and behavior
- Available options:
Enabled: Samples all available explainers
Disabled: Uses only selected explainers
Requires: Boolean toggle (default: Enabled)
- Number of features for Surrogate Partial Dependence Plot. Set to -1 to use all features.
What it does: Sets the number of features to include in Surrogate Partial Dependence Plots
Purpose: Controls the scope of Partial Dependence Plot generation for surrogate models
PDP: Partial Dependence Plots show the effect of features on model predictions
Auto mode (-1): Uses all available features for PDP generation
Custom value: Limits PDP to specified number of features
Requires: Integer value (default: 10)
- Cross-validation folds for surrogate models.
What it does: Sets the number of cross-validation folds for surrogate model training
Purpose: Controls cross-validation configuration for surrogate model validation
CV folds: More folds provide more robust surrogate model evaluation
Validation: Ensures surrogate models generalize well to unseen data
Requires: Integer value (default: 3)
- Number of columns to bin for surrogate models.
What it does: Sets the number of columns to bin for surrogate model preprocessing
Purpose: Controls binning configuration for surrogate model feature processing
Binning: Reduces feature cardinality for surrogate model efficiency
Columns: Number of columns to apply binning transformations
Requires: Integer value (default: 0)
- h2o_mli_nthreads
What it does: Sets the number of threads for H2O MLI (Model Lineage and Interpretability) operations
Purpose: Controls parallel processing for H2O MLI operations
Performance: More threads can improve MLI operation speed
Threading: Number of threads allocated for H2O MLI processing
Requires: Integer value (default: 4)
- Allow use of MOJO scoring pipeline.
What it does: Controls whether to allow MOJO scoring pipeline usage
Purpose: Enables or disables MOJO-based scoring for model predictions
MOJO: Model Object, Optimized - provides fast model scoring
- Available options:
Enabled: Allows MOJO scoring pipeline usage
Disabled: Disables MOJO scoring pipeline usage
Requires: Boolean toggle (default: Enabled)
- Sample size for surrogate models.
What it does: Sets the sample size for surrogate model training
Purpose: Controls the amount of data used for training surrogate models
Sampling: Larger samples provide more robust surrogate models but increase computation
Size: Number of samples used for surrogate model training
Requires: Integer value (default: 100000)
- Number of bins for quantile binning.
What it does: Sets the number of bins for quantile binning operations
Purpose: Controls the granularity of quantile-based feature binning
Quantile binning: Divides features into equal-frequency bins
Bins: Number of bins to create for quantile binning
Requires: Integer value (default: 10)
- Number of trees for Random Forest surrogate model.
What it does: Sets the number of trees for Random Forest surrogate models
Purpose: Controls the complexity of Random Forest surrogate models
Random Forest: Ensemble method using multiple decision trees
Trees: More trees can improve surrogate model accuracy but increase computation
Requires: Integer value (default: 100)
- Speed up predictions with a fast approximation.
What it does: Controls whether to use fast approximation for predictions
Purpose: Enables faster prediction generation at the cost of some accuracy
Approximation: Fast approximation methods trade accuracy for speed
- Available options:
Enabled: Uses fast approximation for predictions
Disabled: Uses exact prediction methods
Requires: Boolean toggle (default: Enabled)
- Max depth for Random Forest surrogate model.
What it does: Sets the maximum depth for Random Forest surrogate model trees
Purpose: Controls the complexity of trees in Random Forest surrogate models
Tree depth: Deeper trees can capture more complex patterns but increase overfitting risk
Maximum: Trees are limited to this maximum depth
Requires: Integer value (default: 20)
- Regularization strength for k-LIME GLM’s.
What it does: Sets the regularization strength for k-LIME Generalized Linear Models
Purpose: Controls overfitting prevention in k-LIME GLM surrogate models
Regularization: Higher values provide stronger regularization but may underfit
Strength: List of regularization values to test during k-LIME GLM training
Requires: List of floats (default: [0.000001,1e-8])
- Regularization distribution between L1 and L2 for k-LIME GLM’s.
What it does: Sets the distribution between L1 and L2 regularization for k-LIME GLMs
Purpose: Controls the balance between L1 (Lasso) and L2 (Ridge) regularization
Elastic Net: Combination of L1 and L2 regularization methods
Distribution: 0 = pure L2, 1 = pure L1, values in between = elastic net
Requires: Float value (default: 0)
- Max cardinality for numeric variables in surrogate models to be considered categorical.
What it does: Sets the maximum cardinality threshold for treating numeric variables as categorical in surrogate models
Purpose: Controls when numeric variables are treated as categorical in surrogate models
Cardinality: Variables with unique values below this threshold are treated as categorical
Threshold: Numeric variables with cardinality above this are treated as continuous
Requires: Integer value (default: 25)
- Maximum number of features allowed for k-LIME k-means clustering.
What it does: Sets the maximum number of features for k-LIME k-means clustering
Purpose: Limits the number of features used in k-LIME k-means clustering operations
Clustering: k-means clustering groups similar data points for LIME explanations
Features: Maximum number of features to include in clustering operations
Requires: Integer value (default: 6)
- Use all columns for k-LIME k-means clustering (this will override `mli_max_number_cluster_vars` if set to `True`).
What it does: Controls whether to use all columns for k-LIME k-means clustering
Purpose: Determines if all available columns are used in k-LIME clustering
Override: When enabled, overrides the maximum feature limit for clustering
- Available options:
Enabled: Uses all columns for k-LIME clustering
Disabled: Respects maximum feature limit for clustering * Requires: Boolean toggle (default: Disabled)
- Unique feature values count driven Partial Dependence Plot binning and chart selection.
What it does: Controls whether PDP binning and chart selection is driven by unique feature values count
Purpose: Uses feature cardinality to determine optimal binning and visualization strategies
PDP: Partial Dependence Plot binning adapts based on feature uniqueness
- Available options:
Enabled: Uses unique values count for PDP binning decisions
Disabled: Uses standard PDP binning strategies * Requires: Boolean toggle (default: Disabled)
- Threshold for Partial Dependence Plot binning and chart selection (<=threshold categorical, >threshold numeric).
What it does: Sets the threshold for determining categorical vs numeric treatment in PDP
Purpose: Controls how features are categorized for Partial Dependence Plot generation
Threshold: Features with unique values <= threshold are treated as categorical
Classification: Features with unique values > threshold are treated as numeric
Requires: Integer value (default: 11)
- Add to config.toml via TOML string.
What it does: Allows adding configuration to config.toml file via TOML string
Purpose: Provides programmatic way to add configuration settings to TOML file
TOML: TOML (Tom’s Obvious, Minimal Language) configuration format
String: TOML-formatted string to add to configuration file
Requires: String TOML configuration (default: “”)
- Use Kernel Explainer to obtain Shapley values for original features
What it does: Controls whether to use Kernel Explainer for Shapley value computation on original features
Purpose: Enables Shapley value computation using Kernel Explainer for feature importance
Shapley values: Game theory-based feature importance scores
- Available options:
Enabled: Uses Kernel Explainer for Shapley values
Disabled: Uses alternative methods for Shapley value computation * Requires: Boolean toggle (default: Disabled)
- Sample input dataset for Kernel Explainer
What it does: Controls whether to sample the input dataset for Kernel Explainer
Purpose: Enables dataset sampling to improve Kernel Explainer performance
Sampling: Reduces dataset size for faster Kernel Explainer execution
- Available options:
Enabled: Samples input dataset for Kernel Explainer
Disabled: Uses full dataset for Kernel Explainer * Requires: Boolean toggle (default: Disabled)
- Sample size for input dataset passed to Kernel Explainer
What it does: Sets the sample size for input dataset passed to Kernel Explainer
Purpose: Controls the amount of data used for Kernel Explainer analysis
Sampling: Larger samples provide more accurate explanations but increase computation
Size: Number of samples to use for Kernel Explainer input
Requires: Integer value (default: 1000)
- Number of times to re-evaluate the model when explaining each prediction with Kernel Explainer. Default is determined internally
What it does: Sets the number of model re-evaluations for each prediction in Kernel Explainer
Purpose: Controls the thoroughness of Kernel Explainer analysis per prediction
Re-evaluation: More evaluations provide more accurate explanations but increase computation
Auto mode: Uses automatic determination based on model complexity
Custom value: Forces specific number of re-evaluations per prediction
Requires: String or integer (default: auto)
- L1 regularization for Kernel Explainer
What it does: Sets the L1 regularization parameter for Kernel Explainer
Purpose: Controls L1 regularization in Kernel Explainer linear models
L1 regularization: Promotes sparsity in feature selection for explanations
Regularization: Higher values increase L1 regularization strength
Requires: String or float (default: aic)
- Max runtime for Kernel Explainer in seconds
What it does: Sets the maximum runtime for Kernel Explainer operations
Purpose: Prevents Kernel Explainer from running indefinitely
Timeout: Maximum time in seconds allowed for Kernel Explainer execution
Prevention: Helps prevent hanging Kernel Explainer operations
Requires: Integer value (default: 900)
- Number of tokens used for MLI NLP explanations. -1 means all.
What it does: Sets the number of tokens to use for MLI NLP explanations
Purpose: Controls the scope of token-based explanations for NLP models
Tokens: More tokens provide more comprehensive explanations but increase computation
Auto mode (-1): Uses all available tokens for NLP explanations
Custom value: Limits NLP explanations to specified number of tokens
Requires: Integer value (default: 20)
- Sample size for MLI NLP explainers.
What it does: Sets the sample size for MLI NLP explainer operations
Purpose: Controls the amount of data used for NLP explainer analysis
Sampling: Larger samples provide more accurate NLP explanations but increase computation
Size: Number of samples to use for MLI NLP explainer operations
Requires: Integer value (default: 10000)
- Minimum number of documents in which token has to appear. Integer mean absolute count, float means percentage.
What it does: Sets the minimum document frequency threshold for token inclusion in NLP explanations
Purpose: Filters out rare tokens that may not be meaningful for explanations
Document frequency: Tokens must appear in at least this many documents to be included
Integer: Absolute count of documents
Float: Percentage of total documents
Requires: Integer or float (default: 3)
- Maximum number of documents in which token has to appear. Integer mean absolute count, float means percentage.
What it does: Sets the maximum document frequency threshold for token inclusion in NLP explanations
Purpose: Filters out very common tokens that may not be discriminative
Document frequency: Tokens appearing in more than this many documents are excluded
Integer: Absolute count of documents
Float: Percentage of total documents
Requires: Integer or float (default: 0.9)
- The minimum value in the ngram range. The tokenizer will generate all possible tokens in the (mli_nlp_min_ngram, mli_nlp_max_ngram) range.
What it does: Sets the minimum n-gram size for MLI NLP tokenization
Purpose: Controls the minimum length of n-grams generated for NLP explanations
N-grams: Contiguous sequences of n tokens
Minimum: Smallest n-gram size to generate for NLP tokenization
Requires: Integer value (default: 1)
- The maximum value in the ngram range. The tokenizer will generate all possible tokens in the (mli_nlp_min_ngram, mli_nlp_max_ngram) range.
What it does: Sets the maximum n-gram size for MLI NLP tokenization
Purpose: Controls the maximum length of n-grams generated for NLP explanations
N-grams: Contiguous sequences of n tokens
Maximum: Largest n-gram size to generate for NLP tokenization
Requires: Integer value (default: 1)
- Mode used to choose N tokens for MLI NLP. “top” chooses N top tokens. “bottom” chooses N bottom tokens. “top-bottom” chooses math.floor(N/2) top and math.ceil(N/2) bottom tokens. “linspace” chooses N evenly spaced out tokens.
What it does: Sets the selection mode for choosing N tokens in MLI NLP explanations
Purpose: Controls how tokens are selected for NLP explanation analysis
Selection modes: Different strategies for token selection based on importance
Options: top, bottom, top-bottom, linspace
Requires: String selection (default: top)
- The number of top tokens to be used as features when building token based feature importance.
What it does: Sets the number of top tokens for token-based feature importance analysis
Purpose: Controls the scope of token-based feature importance computation
Feature importance: Top tokens are used to build feature importance models
Auto mode (-1): Uses automatic token selection for feature importance
Custom value: Limits feature importance to specified number of top tokens
Requires: Integer value (default: -1)
- The number of top tokens to be used as features when computing text LOCO.
What it does: Sets the number of top tokens for text Leave-One-Covariate-Out (LOCO) analysis
Purpose: Controls the scope of text LOCO computation for feature importance
LOCO: Leave-One-Covariate-Out method for feature importance estimation
Auto mode (-1): Uses automatic token selection for LOCO analysis
Custom value: Limits LOCO analysis to specified number of top tokens
Requires: Integer value (default: -1)
- Tokenizer for surrogate models. Only applies to NLP models.
What it does: Specifies the tokenizer method to use when tokenizing a dataset for surrogate models in NLP.
Purpose: Allows selection of tokenization approach for surrogate model feature construction in NLP explanations.
- Options:
TF-IDF: Uses term frequency-inverse document frequency to generate tokens/features.
Linear Model + TF-IDF: First computes TF-IDF tokens, then fits a linear model between tokens and target; importance of tokens determined by linear model coefficients.
Default: Linear Model + TF-IDF
NLP only: This setting only applies to natural language processing models.
Requires: String selection (default: Linear Model + TF-IDF)
- The number of top tokens to be used as features when building surrogate models. Only applies to NLP models.
What it does: Sets the number of top tokens for surrogate model feature construction in NLP
Purpose: Controls the scope of token features used in NLP surrogate models
Surrogate models: Simpler models used to explain complex NLP model behavior
NLP only: This setting only applies to natural language processing models
Tokens: Number of top tokens to include as features in surrogate models
Requires: Integer value (default: 100)
- Ignore stop words for MLI NLP.
What it does: Controls whether to ignore stop words in MLI NLP processing
Purpose: Filters out common words that may not be meaningful for NLP explanations
Stop words: Common words like “the”, “and”, “is” that are often filtered out
- Available options:
Enabled: Ignores stop words in MLI NLP
Disabled: Includes stop words in MLI NLP processing * Requires: Boolean toggle (default: Disabled)
- List of words to filter out before generation of text tokens, which are passed to MLI NLP LOCO and surrogate models (if enabled). Default is ‘english’. Pass in custom stop-words as a list, e.g., [‘great’, ‘good’].
What it does: Sets the list of stop words to filter out in MLI NLP processing
Purpose: Defines custom stop words to exclude from NLP analysis
Stop words: Words to filter out before token generation for LOCO and surrogate models
Default: Uses English stop words list
Custom: Allows specification of custom stop words list
Requires: String or list (default: english)
- Append passed in list of custom stop words to default ‘english’ stop words.
What it does: Controls whether to append custom stop words to default English stop words
Purpose: Allows combining custom stop words with default English stop words
Append: Custom stop words are added to the default English stop words list
- Available options:
Enabled: Appends custom stop words to default list
Disabled: Replaces default stop words with custom list * Requires: Boolean toggle (default: Disabled)
- Set dask CUDA/RAPIDS cluster settings for single node workers.
What it does: Configures Dask CUDA/RAPIDS cluster settings for single node worker environments
Purpose: Optimizes Dask distributed computing for CUDA/RAPIDS workloads on single nodes
CUDA/RAPIDS: GPU-accelerated computing frameworks for machine learning
Configuration: JSON object with scheduler port, dashboard address, and protocol settings
Default: {“scheduler_port”:0,”dashboard_address”:”:0”,”protocol”:”tcp”}
Requires: JSON object (default: {“scheduler_port”:0,”dashboard_address”:”:0”,”protocol”:”tcp”})
- Set dask cluster settings for single node workers.
What it does: Configures Dask cluster settings for single node worker environments
Purpose: Optimizes Dask distributed computing for single node setups
Cluster: Dask cluster configuration for distributed computing
Configuration: JSON object with worker count, processes, threads, and network settings
Default: {“n_workers”:1,”processes”:true,”threads_per_worker”:1,”scheduler_port”:0,”dashboard_address”:”:0”,”protocol”:”tcp”}
Requires: JSON object (default: {“n_workers”:1,”processes”:true,”threads_per_worker”:1,”scheduler_port”:0,”dashboard_address”:”:0”,”protocol”:”tcp”})
- Set dask scheduler env.
What it does: Sets environment variables for Dask scheduler
Purpose: Configures environment variables for Dask scheduler processes
Environment: Environment variables passed to Dask scheduler for configuration
Configuration: Dictionary of environment variable key-value pairs
Default: Empty dictionary uses default environment
Requires: Dictionary (default: {})
- Set dask worker environment variables. NCCL_SOCKET_IFNAME is automatically set, but can be overridden here.
What it does: Sets environment variables for Dask worker processes
Purpose: Configures environment variables for Dask worker processes
Environment: Environment variables passed to Dask workers for configuration
NCCL: NVIDIA Collective Communications Library settings for GPU communication
Configuration: Dictionary with NCCL settings and other worker environment variables
Default: {“NCCL_P2P_DISABLE”:”1”,”NCCL_DEBUG”:”WARN”}
Requires: Dictionary (default: {“NCCL_P2P_DISABLE”:”1”,”NCCL_DEBUG”:”WARN”})
- Set dask cuda worker environment variables.
What it does: Sets environment variables specifically for Dask CUDA worker processes
Purpose: Configures CUDA-specific environment variables for Dask workers
CUDA: NVIDIA CUDA environment variables for GPU computing
Workers: Environment variables specific to CUDA-enabled Dask workers
Configuration: Dictionary of CUDA-specific environment variable key-value pairs
Default: Empty dictionary uses default CUDA environment
Requires: Dictionary (default: {})
- Enable XSRF Webserver protection
What it does: Enables Cross-Site Request Forgery (XSRF) protection for the webserver
Purpose: Provides security protection against XSRF attacks on the web interface
Security: XSRF protection prevents malicious websites from making unauthorized requests
- Available options:
Enabled: Enables XSRF protection
Disabled: Disables XSRF protection
Requires: Boolean toggle (default: Enabled)
- SameSite Attribute for XSRF Cookie
What it does: Sets the SameSite attribute for _xsrf cookies
Purpose: Controls how XSRF cookies are handled across different sites
SameSite: Cookie attribute that controls cross-site cookie behavior
Security: Helps prevent XSRF attacks by controlling cookie sharing
Options: “Lax”, “Strict”, or “”
Requires: String attribute (default: Lax)
- Enable secure flag on HTTP cookies
What it does: Enables the secure flag on HTTP cookies
Purpose: Ensures cookies are only transmitted over HTTPS connections
Security: Secure flag prevents cookie transmission over unencrypted HTTP
- Available options:
Enabled: Enables secure flag on cookies
Disabled: Allows cookies over HTTP connections
Requires: Boolean toggle (default: Disabled)
- When enabled, webserver verifies session and request IP address
What it does: Controls whether the webserver verifies session and request IP addresses
Purpose: Provides additional security by verifying session consistency with IP addresses
Security: IP verification helps prevent session hijacking and unauthorized access
- Available options:
Enabled: Verifies session and IP addresses
Disabled: Does not verify session and IP addresses
Requires: Boolean toggle (default: Disabled)
- Enable concurrent session for same user
What it does: Controls whether multiple concurrent sessions are allowed for the same user
Purpose: Determines if users can have multiple active sessions simultaneously
Concurrency: Multiple sessions allow users to access the system from different devices
- Available options:
Enabled: Allows concurrent sessions for same user
Disabled: Restricts users to single active session
Requires: Boolean toggle (default: Enabled)
- Enabling imputation adds new picker to EXPT setup GUI and triggers imputation functionality in Transformers
What it does: Controls whether imputation functionality is enabled in the experiment setup GUI
Purpose: Enables or disables imputation features in the experiment setup interface
Imputation: Process of filling in missing values in datasets
GUI: Adds imputation picker to experiment setup graphical user interface
- Available options:
Enabled: Enables imputation functionality
Disabled: Disables imputation functionality
Requires: Boolean toggle (default: Disabled)
- datatable_parse_max_memory_bytes
What it does: Sets the maximum memory in bytes for datatable parsing operations
Purpose: Limits memory usage during datatable parsing to prevent system overload
Memory: Maximum memory allocation for datatable parsing operations
Auto mode (-1): Uses automatic memory allocation based on available system memory
Custom value: Limits datatable parsing to specified memory amount
Requires: Integer value (default: -1)
- datatable_separator
What it does: Sets the separator character for datatable parsing
Purpose: Defines the delimiter used to separate fields in datatable parsing
Separator: Character used to delimit fields in data files
Auto mode: Automatically detects the appropriate separator
Custom value: Forces use of specified separator character
Requires: String separator (default: auto)
- Whether to enable ping of system status during DAI data ingestion.
What it does: Controls whether to enable system status monitoring during data ingestion
Purpose: Provides real-time system monitoring during data ingestion operations
Monitoring: Tracks system performance and resource usage during data loading
- Available options:
Enabled: Enables system status monitoring during ingestion
Disabled: Disables system status monitoring during ingestion
Requires: Boolean toggle (default: Disabled)
- Threshold for reporting high correlation
What it does: Sets the correlation threshold for reporting high correlation between features
Purpose: Identifies and reports features with correlation above the specified threshold
Correlation: Statistical measure of linear relationship between features
Threshold: Features with correlation above this value are flagged as highly correlated
Requires: Float value (default: 0.95)
- datatable_bom_csv
What it does: Controls whether to handle Byte Order Mark (BOM) in CSV files during datatable parsing
Purpose: Handles BOM characters that may be present in CSV files
BOM: Byte Order Mark characters that indicate text encoding
CSV parsing: Ensures proper parsing of CSV files with BOM characters
- Available options:
Enabled: Handles BOM in CSV files
Disabled: Does not handle BOM in CSV files
Requires: Boolean toggle (default: Disabled)
- check_invalid_config_toml_keys
What it does: Controls whether to check for invalid keys in TOML configuration files
Purpose: Validates TOML configuration files for invalid or unrecognized keys
Validation: Helps identify configuration errors and typos in TOML files
- Available options:
Enabled: Checks for invalid TOML keys
Disabled: Skips validation of TOML keys
Requires: Boolean toggle (default: Enabled)
- predict_safe_trials
What it does: Sets the number of safe trials for prediction operations
Purpose: Controls the number of retry attempts for prediction operations
Safety: More trials can improve prediction reliability but increase computation time
Trials: Number of attempts for prediction operations
Requires: Integer value (default: 2)
- fit_safe_trials
What it does: Sets the number of safe trials for model fitting operations
Purpose: Controls the number of retry attempts for model fitting operations
Safety: More trials can improve model fitting reliability but increase computation time
Trials: Number of attempts for model fitting operations
Requires: Integer value (default: 2)
- Whether to allow no –pid=host setting. Some GPU info from within docker will not be correct.
What it does: Controls whether to allow Docker containers without –pid=host setting
Purpose: Determines if Docker containers can run without host process ID namespace
Docker: –pid=host allows container to see all host processes
GPU info: Some GPU information may be incorrect without –pid=host
- Available options:
Enabled: Allows containers without –pid=host
Disabled: Requires –pid=host setting for containers
Requires: Boolean toggle (default: Enabled)
- terminate_experiment_if_memory_low
What it does: Controls whether to terminate experiments when memory usage is low
Purpose: Prevents experiments from continuing when system memory is critically low
Memory management: Helps prevent system crashes due to memory exhaustion
- Available options:
Enabled: Terminates experiments when memory is low
Disabled: Continues experiments regardless of memory usage
Requires: Boolean toggle (default: Disabled)
- memory_limit_gb_terminate
What it does: Sets the memory limit in GB for experiment termination
Purpose: Defines the memory threshold below which experiments are terminated
Memory limit: Experiments are terminated when available memory falls below this limit
Threshold: Memory limit in gigabytes for experiment termination
Requires: Float value (default: 5)
- last_exclusive_mode
What it does: Controls the last exclusive mode setting for experiment execution
Purpose: Determines the exclusive mode behavior for the final experiment phase
Exclusive mode: Ensures experiments have exclusive access to system resources
Final phase: Applies exclusive mode settings to the last experiment execution phase
Requires: String (default: “”)
- max_time_series_properties_sample_size
What it does: Sets the maximum sample size for time series properties analysis
Purpose: Limits the amount of data used for time series property calculations
Time series: Properties like seasonality, trend, and autocorrelation
Sample size: Maximum number of data points used for time series analysis
Requires: Integer value (default: 250000)
- max_lag_sizes
What it does: Sets the maximum lag sizes for time series analysis
Purpose: Limits the maximum number of lagged features created for time series
Lags: Previous time steps used as features for time series prediction
Maximum: Largest lag size allowed in time series feature engineering
Requires: Integer value (default: 30)
- min_lag_autocorrelation
What it does: Sets the minimum autocorrelation threshold for lag selection
Purpose: Controls which lags are included based on autocorrelation strength
Autocorrelation: Correlation of time series with its lagged values
Threshold: Minimum autocorrelation required for lag inclusion
Requires: Float value (default: 0.1)
- max_signal_lag_sizes
What it does: Sets the maximum lag sizes for signal processing in time series
Purpose: Limits the maximum number of lags used in signal processing operations
Signal processing: Advanced time series analysis techniques
Lags: Maximum lag size for signal-based time series features
Requires: Integer value (default: 100)
- single_model_vs_cv_score_reldiff
What it does: Sets the relative difference threshold between single model and cross-validation scores
Purpose: Controls when single model scores are considered significantly different from CV scores
Score comparison: Helps identify overfitting by comparing single model vs CV performance
Threshold: Relative difference threshold for score comparison
Requires: Float value (default: 0.05)
- single_model_vs_cv_score_reldiff2
What it does: Sets the secondary relative difference threshold between single model and CV scores
Purpose: Provides additional threshold for single model vs CV score comparison
Score comparison: Secondary threshold for more nuanced score difference analysis
Threshold: Additional relative difference threshold for score comparison
Requires: Float value (default: 0)
- Max number of splits for ‘refit’ method to avoid OOM/slowness, both for GA and final refit. In GA, will fall back to fast_tta, in final will fail with error msg.
What it does: Sets the maximum number of splits for refit method to prevent out-of-memory issues
Purpose: Limits the number of splits in refit operations to prevent memory problems and slowness
Refit method: Technique for refitting models with different data splits
OOM prevention: Helps avoid out-of-memory errors and performance issues
Fallback: Falls back to fast_tta in genetic algorithm, fails with error in final refit
Requires: Integer value (default: 1000)