Use a Custom Transformer¶

First, we'll initialize a client with our server credentials and store it in the variable dai.

In [1]:

Copied!

import driverlessai

dai = driverlessai.Client(address='http://localhost:12345', username='py', password='py')
import driverlessai

dai = driverlessai.Client(address='http://localhost:12345', username='py', password='py')

Here we grab a custom recipe from our recipe repo (https://github.com/h2oai/driverlessai-recipes) and upload it to the Driverless AI server.

In [23]:

Copied!

dai.recipes.create('https://github.com/h2oai/driverlessai-recipes/blob/master/transformers/numeric/boxcox_transformer.py')
dai.recipes.create('https://github.com/h2oai/driverlessai-recipes/blob/master/transformers/numeric/boxcox_transformer.py')

Complete 100%

It's also possible to use the same dai.recipes.create() function to upload recipes that we have written locally.

In [24]:

Copied!

dai.recipes.create('sum.py')
dai.recipes.create('sum.py')

Complete 100%

We can create a list of custom transformer recipe objects.

In [25]:

Copied!

custom_transformers = [t for t in dai.recipes.transformers.list() if t.is_custom]
display(custom_transformers)
custom_transformers = [t for t in dai.recipes.transformers.list() if t.is_custom]
display(custom_transformers)

[<class 'driverlessai.recipes.TransformerRecipe'> BoxCoxTransformer,
 <class 'driverlessai.recipes.TransformerRecipe'> SumTransformer]

For demonstration purposes, we'll grab the first dataset available on the server. Then, we'll use it to get an experiment preview. Note that BoxCox and Sum are now the only transformers in the 'Feature engineering search space'.

In [26]:

Copied!





ds = dai.datasets.list()[0]
dai.experiments.preview(
    train_dataset=ds, 
    target_column=ds.columns[-1], 
    task='classification', 
    transformers=custom_transformers
)
ds = dai.datasets.list()[0]
dai.experiments.preview(
    train_dataset=ds, 
    target_column=ds.columns[-1], 
    task='classification', 
    transformers=custom_transformers
)

ACCURACY [7/10]:
- Training data size: *150 rows, 5 cols*
- Feature evolution: *[Constant, DecisionTree, LightGBM, XGBoostGBM]*, *3-fold CV**, 2 reps*
- Final pipeline: *Ensemble (6 models), 3-fold CV*

TIME [2/10]:
- Feature evolution: *8 individuals*, up to *42 iterations*
- Early stopping: After *5* iterations of no improvement

INTERPRETABILITY [8/10]:
- Feature pre-pruning strategy: Permutation Importance FS
- Monotonicity constraints: enabled
- Feature engineering search space: [BoxCox, Sum]

[Constant, DecisionTree, LightGBM, XGBoostGBM] models to train:
- Model and feature tuning: *192*
- Feature evolution: *288*
- Final pipeline: *6*

Estimated runtime: *minutes*
Auto-click Finish/Abort if not done in: *1 day*/*7 days*