Imputation in Driverless AI¶
The impute feature lets you fill in missing values with substituted values. Missing values can be imputed based on the column’s mean, median, minimum, maximum, or mode value. You can also impute based on a specific percentile or by a constant value.
The imputation is precomputed on all data or inside the pipeline (based on what’s in the train split).
The following guidelines should be followed when performing imputation:
For constant imputation on numeric columns, constant must be numeric.
For constant imputation on string columns, constant must be a string.
For percentile imputation, the percentage value must be between 0 and 100.
Notes:
This feature is experimental.
Time columns cannot be imputed.
Enabling Imputation¶
Imputation is disabled by default. It can be enabled by setting enable_imputation=true
in the config.toml (for native installs) or via the DRIVERLESS_AI_ENABLE_IMPUTATION=true
environment variable (Docker image installs). This enables imputation functionality in transformers.
Running an Experiment with Imputation¶
Once imputation is enabled, you will have the option when running an experiment to add imputation columns.
Click on Columns Imputation in the Experiment Setup page.
Click on Add Imputation in the upper-right corner.
Select the column that contains missing values you want to impute.
Select the imputation type. Available options are:
mean: The column’s numeric mean value displays by default. (Default method for numeric values.)
median: When selected, the column’s numeric numeric median value displays by default.
min: When selected, the column’s numeric minimum value displays by default.
max: When selected, the column’s numeric maximum value displays by default.
const: Enter a string of characters. (Default method for string columns)
mode: When selected, the column’s numeric mode value displays by default.
percentile: Specify a percentile rank value between 0 and 100. (Defaults to 95.) In addition, specify a numeric imputed value.
Optionally allow Driverless AI to compute the imputation value during validation instead of using the inputted imputed value.
Click Save when you are done.
At this point, you can add additional imputations, delete the imputation you just created, or close this form and return to the experiment. Note that each column can have only a single imputation.