Timeseries configuration¶
time_series_recipe
¶
Time-series lag-based recipe (Boolean) (Expert Setting)
Default value True
Enable time series lag-based recipe with lag transformers. If disabled, the same train-test gap and periods are used, but no lag transformers are enabled. If disabled, the set of feature transformations is quite limited without lag transformers, so consider setting enable_time_unaware_transformers to true in order to treat the problem as more like an IID type problem.
time_series_leaderboard_mode
¶
Control the automatic time-series leaderboard mode (String) (Expert Setting)
Default value 'diverse'
〈diverse〉: explore a diverse set of models built using various expert settings. Note that it’s possible to rerun another such diverse leaderboard on top of the best-performing model(s), which will effectively help you compose these expert settings. 〈sliding_window〉: If the forecast horizon is N periods, create a separate model for each of the (gap, horizon) pairs of (0,n), (n,n), (2*n,n), …, (2*N-1, n) in units of time periods. The number of periods to predict per model n is controlled by the expert setting 〈time_series_leaderboard_periods_per_model〉, which defaults to 1.
time_series_leaderboard_periods_per_model
¶
Number of periods per model if time_series_leaderboard_mode is 〈sliding_window〉. (Number) (Expert Setting)
Default value 1
Fine-control to limit the number of models built in the 〈sliding_window〉 mode. Larger values lead to fewer models.
time_series_merge_splits
¶
Larger validation splits for lag-based recipe (Boolean) (Expert Setting)
Default value True
Whether to create larger validation splits that are not bound to the length of the forecast horizon.
merge_splits_max_valid_ratio
¶
Maximum ratio of training data samples used for validation (-1 = auto) (Float) (Expert Setting)
Default value -1.0
Maximum ratio of training data samples used for validation across splits when larger validation splits are created.
fixed_size_train_timespan
¶
Fixed-size train timespan across splits (Boolean) (Expert Setting)
Default value False
- Whether to keep a fixed-size train timespan across time-based splits.
That leads to roughly the same amount of train samples in every split.
time_series_validation_fold_split_datetime_boundaries
¶
Custom validation splits for time-series experiments (String) (Expert Setting)
Default value ''
Provide date or datetime timestamps (in same format as the time column) for custom training and validation splits like this: 《tr_start1, tr_end1, va_start1, va_end1, …, tr_startN, tr_endN, va_startN, va_endN》
time_series_validation_splits
¶
Number of time-based splits for internal model validation (-1 = auto) (Number) (Expert Setting)
Default value -1
Set fixed number of time-based splits for internal model validation (actual number of splits allowed can be less and is determined at experiment run-time).
time_series_splits_max_overlap
¶
Maximum overlap between two time-based splits. (Float) (Expert Setting)
Default value 0.5
Maximum overlap between two time-based splits. Higher values increase the amount of possible splits.
holiday_features
¶
Generate holiday features (Boolean) (Expert Setting)
Default value True
Automatically generate is-holiday features from date columns
holiday_countries
¶
Country code(s) for holiday features (List) (Expert Setting)
Default value ['UnitedStates', 'UnitedKingdom', 'EuropeanCentralBank', 'Germany', 'Mexico', 'Japan']
List of countries for which to look up holiday calendar and to generate is-Holiday features for
sample_lag_sizes
¶
Whether to sample lag sizes (Boolean) (Expert Setting)
Default value False
If enabled, sample from a set of possible lag sizes (e.g., lags=[1, 4, 8]) for each lag-based transformer, to no more than max_sampled_lag_sizes lags. Can help reduce overall model complexity and size, esp. when many unavailable columns for prediction.
max_sampled_lag_sizes
¶
Number of sampled lag sizes. -1 for auto. (Number) (Expert Setting)
Default value -1
If sample_lag_sizes is enabled, sample from a set of possible lag sizes (e.g., lags=[1, 4, 8]) for each lag-based transformer, to no more than max_sampled_lag_sizes lags. Can help reduce overall model complexity and size. Defaults to -1 (auto), in which case it’s the same as the feature interaction depth controlled by max_feature_interaction_depth.
override_lag_sizes
¶
Time-series lags override, e.g. [7, 14, 21] (List) (Expert Setting)
Default value []
Override lags to be used e.g. [7, 14, 21] # this exact list e.g. 21 # produce from 1 to 21 e.g. 21:3 produce from 1 to 21 in step of 3 e.g. 5-21 produce from 5 to 21 e.g. 5-21:3 produce from 5 to 21 in step of 3
override_ufapt_lag_sizes
¶
Lags override for features that are not known ahead of time (List) (Expert Setting)
Default value []
Override lags to be used for features that are not known ahead of time e.g. [7, 14, 21] # this exact list e.g. 21 # produce from 1 to 21 e.g. 21:3 produce from 1 to 21 in step of 3 e.g. 5-21 produce from 5 to 21 e.g. 5-21:3 produce from 5 to 21 in step of 3
override_non_ufapt_lag_sizes
¶
Lags override for features that are known ahead of time (List) (Expert Setting)
Default value []
Override lags to be used for features that are known ahead of time e.g. [7, 14, 21] # this exact list e.g. 21 # produce from 1 to 21 e.g. 21:3 produce from 1 to 21 in step of 3 e.g. 5-21 produce from 5 to 21 e.g. 5-21:3 produce from 5 to 21 in step of 3
min_lag_size
¶
Smallest considered lag size (-1 = auto) (Number) (Expert Setting)
Default value -1
Smallest considered lag size
allow_time_column_as_feature
¶
Enable feature engineering from time column (Boolean) (Expert Setting)
Default value True
Whether to enable feature engineering based on selected time column, e.g. Date~weekday.
allow_time_column_as_numeric_feature
¶
Allow integer time column as numeric feature (Boolean) (Expert Setting)
Default value False
Whether to enable integer time column to be used as a numeric feature. If using time series recipe, using time column (numeric time stamps) as input features can lead to model that memorizes the actual time stamps instead of features that generalize to the future.
datetime_funcs
¶
Allowed date and date-time transformations (List) (Expert Setting)
Default value ['year', 'quarter', 'month', 'week', 'weekday', 'day', 'dayofyear', 'hour', 'minute', 'second']
- Allowed date or date-time transformations.
Date transformers include: year, quarter, month, week, weekday, day, dayofyear, num. Date transformers also include: hour, minute, second. Features in DAI will show up as get_ + transformation name. E.g. num is a direct numeric value representing the floating point value of time, which can lead to over-fitting if used on IID problems. So this is turned off by default.
filter_datetime_funcs
¶
Auto filtering of date and date-time transformations (Boolean) (Expert Setting)
Default value True
Whether to filter out date and date-time transformations that lead to unseen values in the future.
allow_tgc_as_features
¶
Consider time groups columns as standalone features (Boolean) (Expert Setting)
Default value True
- Whether to consider time groups columns (tgc) as standalone features.
Note that 〈time_column〉 is treated separately via 〈Allow to engineer features from time column〉. Note that tgc_allow_target_encoding independently controls if time column groups are target encoded. Use allowed_coltypes_for_tgc_as_features for control per feature type.
allowed_coltypes_for_tgc_as_features
¶
Which tgc feature types to consider as standalone features (List) (Expert Setting)
Default value ['numeric', 'categorical', 'ohe_categorical', 'datetime', 'date', 'text']
Which time groups columns (tgc) feature types to consider as standalone features, if the corresponding flag 《Consider time groups columns as standalone features》 is set to true. E.g. all column types would be [《numeric》, 《categorical》, 《ohe_categorical》, 《datetime》, 《date》, 《text》] Note that 〈time_column〉 is treated separately via 〈Allow to engineer features from time column〉. Note that if lag-based time series recipe is disabled, then all tgc are allowed features.
enable_time_unaware_transformers
¶
Enable time unaware transformers (String) (Expert Setting)
Default value 'auto'
Whether various transformers (clustering, truncated SVD) are enabled, that otherwise would be disabled for time series due to potential to overfit by leaking across time within the fit of each fold.
tgc_only_use_all_groups
¶
Always group by all time groups columns for creating lag features (Boolean) (Expert Setting)
Default value True
Whether to group by all time groups columns for creating lag features, instead of sampling from them
tgc_allow_target_encoding
¶
Target encoding of time groups (String) (Expert Setting)
Default value 'auto'
- Whether to allow target encoding of time groups. This can be useful if there are many groups.
Note that allow_tgc_as_features independently controls if tgc are treated as normal features. 〈auto〉: Choose CV by default. 〈CV〉: Enable out-of-fold and CV-in-CV (if enabled) encoding 〈simple〉: Simple memorized targets per group. 〈off〉: Disable. Only relevant for time series experiments that have at least one time column group apart from the time column.
tgc_allow_features_and_target_encoding_auto_tune
¶
Auto-Tune time column groups as features and target encoding (Boolean) (Expert Setting)
Default value True
if allow_tgc_as_features is true or tgc_allow_target_encoding is true, whether to try both possibilities to see which does better during tuning. Safer than forcing one way or the other.
time_series_holdout_preds
¶
Generate Time-Series Holdout Predictions (Boolean) (Expert Setting)
Default value True
- Enable creation of holdout predictions on training data
using moving windows (useful for MLI, but can be slow)
time_series_max_holdout_splits
¶
Maximum number of splits used for creating final time-series model’s holdout predictions (Number) (Expert Setting)
Default value -1
Max number of splits used for creating final time-series model’s holdout/backtesting predictions. With the default value 〈-1〉 the same amount of splits as during model validation will be used. Use 〈time_series_validation_splits〉 to control amount of time-based splits used for model validation.
mli_ts_fast_approx
¶
Whether to speed up calculation of Time-Series Holdout Predictions (Boolean) (Expert Setting)
Default value False
Whether to speed up time-series holdout predictions for back-testing on training data (used for MLI and metrics calculation). Can be slightly less accurate.
mli_ts_fast_approx_contribs
¶
Whether to speed up calculation of Shapley values for Time-Series Holdout Predictions (Boolean) (Expert Setting)
Default value True
Whether to speed up Shapley values for time-series holdout predictions for back-testing on training data (used for MLI). Can be slightly less accurate.
mli_ts_holdout_contribs
¶
Generate Shapley values for Time-Series Holdout Predictions at the time of experiment (Boolean) (Expert Setting)
Default value True
- Enable creation of Shapley values for holdout predictions on training data
using moving windows (useful for MLI, but can be slow), at the time of the experiment. If disabled, MLI will generate Shapley values on demand.
time_series_min_interpretability
¶
Lower limit on interpretability setting for time-series experiments, implicitly enforced. (Number) (Expert Setting)
Default value 5
Values of 5 or more can improve generalization by more aggressive dropping of least important features. Set to 1 to disable.
lags_dropout
¶
Dropout mode for lag features (String) (Expert Setting)
Default value 'dependent'
Dropout mode for lag features in order to achieve an equal n.a.-ratio between train and validation/test. The independent mode performs a simple feature-wise dropout, whereas the dependent one takes lag-size dependencies per sample/row into account.
prob_lag_non_targets
¶
Probability to create non-target lag features (-1.0 = auto) (Float) (Expert Setting)
Default value -1.0
Normalized probability of choosing to lag non-targets relative to targets (-1.0 = auto)
rolling_test_method
¶
Method to create rolling test set predictions (String) (Expert Setting)
Default value 'tta'
Method to create rolling test set predictions, if the forecast horizon is shorter than the time span of the test set. One can choose between test time augmentation (TTA) and a successive refitting of the final pipeline.
rolling_test_method_max_splits
¶
Max number of splits for 〈refit〉 method to avoid OOM/slowness, both for GA and final refit. In GA, will fall back to fast_tta, in final will fail with error msg. (Number) (Expert Setting)
Default value 1000
fast_tta_internal
¶
Fast TTA for internal validation (feature evolution and holdout predictions) (Boolean) (Expert Setting)
Default value True
Apply TTA in one pass instead of using rolling windows for internal validation split predictions. Note: Setting this to 〈False〉 leads to significantly longer runtimes.
fast_tta_test
¶
Fast TTA for test set predictions (Boolean) (Expert Setting)
Default value True
Apply TTA in one pass instead of using rolling windows for test set predictions. This only applies if the forecast horizon is shorter than the time span of the test set. Note: Setting this to 〈False〉 leads to significantly longer runtimes.
prob_default_lags
¶
Probability for new time-series transformers to use default lags (-1.0 = auto) (Float) (Expert Setting)
Default value -1.0
Probability for new Lags/EWMA gene to use default lags (determined by frequency/gap/horizon, independent of data) (-1.0 = auto)
prob_lagsinteraction
¶
Probability of exploring interaction-based lag transformers (-1.0 = auto) (Float) (Expert Setting)
Default value -1.0
Unnormalized probability of choosing other lag time-series transformers based on interactions (-1.0 = auto)
prob_lagsaggregates
¶
Probability of exploring aggregation-based lag transformers (-1.0 = auto) (Float) (Expert Setting)
Default value -1.0
Unnormalized probability of choosing other lag time-series transformers based on aggregations (-1.0 = auto)
ts_target_trafo
¶
Time series centering or detrending transformation (String) (Expert Setting)
Default value 'none'
Time series centering or detrending transformation. The free parameter(s) of the trend model are fitted and the trend is removed from the target signal, and the pipeline is fitted on the residuals. Predictions are made by adding back the trend. Note: Can be cascaded with 〈Time series lag-based target transformation〉, but is mutually exclusive with regular target transformations. The robust centering or linear detrending variants use RANSAC to achieve a higher tolerance w.r.t. outliers. The Epidemic target transformer uses the SEIR model: https://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology#The_SEIR_model
ts_target_trafo_epidemic_params_dict
¶
Custom bounds for SEIRD epidemic model parameters (Dict) (Expert Setting)
Default value {}
Dictionary to control Epidemic SEIRD model for de-trending of target per time series group. Note: The target column must correspond to I(t), the infected cases as a function of time.
For each training split and time series group, the SEIRD model is fitted to the target signal (by optimizing the free parameters shown below for each time series group).
Then, the SEIRD model’s value is subtracted from the training response, and the residuals are passed to the feature engineering and modeling pipeline. For predictions, the SEIRD model’s value is added to the residual predictions from the pipeline, for each time series group.
Note: Careful selection of the bounds for the free parameters N, beta, gamma, delta, alpha, rho, lockdown, beta_decay, beta_decay_rate is extremely important for good results.
S(t) : susceptible/healthy/not immune
E(t) : exposed/not yet infectious
I(t) : infectious/active <= target column
R(t) : recovered/immune
D(t) : deceased
### Free parameters: - N : total population, N=S+E+I+R+D - beta : rate of exposure (S -> E) - gamma : rate of recovering (I -> R) - delta : incubation period - alpha : fatality rate - rho : rate at which people die - lockdown : day of lockdown (-1 => no lockdown) - beta_decay : beta decay due to lockdown - beta_decay_rate : speed of beta decay
### Dynamics: if lockdown >= 0:
beta_min = beta * (1 - beta_decay) beta = (beta - beta_min) / (1 + np.exp(-beta_decay_rate * (-t + lockdown))) + beta_min
dSdt = -beta * S * I / N dEdt = beta * S * I / N - delta * E dIdt = delta * E - (1 - alpha) * gamma * I - alpha * rho * I dRdt = (1 - alpha) * gamma * I dDdt = alpha * rho * I
Provide lower/upper bounds for each parameter you want to control the bounds for. Valid parameters are: N_min, N_max, beta_min, beta_max, gamma_min, gamma_max, delta_min, delta_max, alpha_min, alpha_max, rho_min, rho_max, lockdown_min, lockdown_max, beta_decay_min, beta_decay_max, beta_decay_rate_min, beta_decay_rate_max. You can change any subset of parameters, e.g., ts_target_trafo_epidemic_params_dict=》{〈N_min〉: 1000, 〈beta_max〉: 0.2}》
To get SEIR model (in cases where death rates are very low, can speed up calculations significantly): set alpha_min=alpha_max=rho_min=rho_max=beta_decay_rate_min=beta_decay_rate_max=0, lockdown_min=lockdown_max=-1.
ts_target_trafo_epidemic_target
¶
Which SEIRD model component the target column corresponds to: I: Infected, R: Recovered, D: Deceased. (String) (Expert Setting)
Default value 'I'
ts_lag_target_trafo
¶
Time series lag-based target transformation (String) (Expert Setting)
Default value 'none'
Time series lag-based target transformation. One can choose between difference and ratio of the current and a lagged target. The corresponding lag size can be set via 〈Target transformation lag size〉. Note: Can be cascaded with 〈Time series target transformation〉, but is mutually exclusive with regular target transformations.
ts_target_trafo_lag_size
¶
Lag size used for time series target transformation (Number) (Expert Setting)
Default value -1
Lag size used for time series target transformation. See setting 〈Time series lag-based target transformation〉. -1 => smallest valid value = prediction periods + gap (automatically adjusted by DAI if too small).
timeseries_split_suggestion_timeout
¶
Timeout in seconds for time-series properties detection in UI. (Float) (Expert Setting)
Default value 30.0
Timeout in seconds for time-series properties detection in UI.
timeseries_recency_weight_power
¶
Power of recency weight for TS splits (Float) (Expert Setting)
Default value 0.0
- Weight TS models scores as split number to this power.
E.g. Use 1.0 to weight split closest to horizon by a factor that is number of splits larger than oldest split. Applies to tuning models and final back-testing models. If 0.0 (default) is used, median function is used, else mean is used.