Appendix B: Using the Driverless AI Python Client

This section describes how to run Driverless AI using the Python client.

Notes:

  • This is an early release of the Driverless AI Python client.

  • Python 3.6 is the only supported version.

  • You must install the h2oai_client wheel to your local Python. This is available from the PY_CLIENT link in the top menu of the UI.

    _images/py_client_link.png

Running an Experiment

  1. After the h2oai_client wheel is installed, import the required modules and log in.
import h2oai_client
import numpy as np
import pandas as pd
import requests
import math
from h2oai_client import Client, ModelParameters

address = 'http://ip_where_driverless_is_running:12345'
username = 'username'
password = 'password'
h2oai = Client(address = address, username = username, password = password)
# Be sure to use the same credentials that you use when signing in through the GUI
  1. Upload training and testing datasets from the Driverless AI /data folder.
train_path = '/data/CreditCard/CreditCard-train.csv'
test_path = '/data/CreditCard/CreditCard-test.csv'

train = h2oai.create_dataset_sync(train_path)
test = h2oai.create_dataset_sync(test_path)
  1. Set the Target, Dropped, Weight, Fold, and Time columns:

    • Target Column: The column we are trying to predict.
    • Dropped Columns: The columns we do not want to use as predictors such as ID columns, columns with data leakage, etc.
    • Weight Column: The column that indicates the per row observation weights. If “None”, each row will have an observation weight of 1.
    • Fold Column: The column that indicates the fold. If “None”, the folds will be determined by Driverless AI.
    • Time Column: The column that provides a time order, if applicable. If “AUTO”, Driverless AI will auto-detect a potential time order. If “OFF”, auto-detection is disabled.

    For this example, we will be predicting default payment next month and we will ignore the column: ID. We do not have a Weight column, Fold column, or Time column. We will let Driverless AI auto-detect any potential time order.

# set the parameters you want to pass to the UI
target = "default payment next month"
drop_cols = ['ID']
weight_col = None
fold_col = None
time_col = '[AUTO]'
  1. Specify the experiment settings. Refer to the Experiment Settings for more information about these settings.
# Pre-set parameters to pass model
is_classification = True
enable_gpus = True
seed=True
scorer_str = 'auc'

# Pre-sent accuracy knobs
accuracy_value = 5
time_value = 5
interpretability = 5
  1. Launch the experiment to run feature engineering and final model training. In addition to the settings previously defined, be sure to also specify the imported training dataset. Adding a test dataset is optional.
experiment = h2oai.start_experiment_sync(ModelParameters(
    # Datasets
    dataset_key=train.key,
    testset_key=test.key,

    # Columns
    target_col=target,
    cols_to_drop=drop_cols,
    weight_col=weight_col,
    fold_col=fold_col,
    time_col=time_col,

    # Parameters
    is_classification=is_classification,
    enable_gpus=enable_gpus,
    seed=seed,
    accuracy=accuracy_value,
    time=time_value,
    interpretability=interpretability,
    scorer=scorer_str
))
  1. View the final model score for the train and test datasets. When feature engineering is complete, an ensemble model can be built depending on the accuracy setting. The experiment object also contains the score on the train and test data for this ensemble model.
print("Final Model Score on Train Data: " + str(round(experiment.train_score, 3)))
print("Final Model Score on Test Data: " + str(round(experiment.test_score, 3)))

Final Model Score on Train Data: 0.781
Final Model Score on Test Data: 0.801
  1. You can also download and examine the features and their importance for the final model. The table below shows the feature name, it’s relative importance, and a description. You can see that some features were engineered by Driverless AI and some were the original.
h2oai.download(src_path=experiment.features_path, dest_dir=".")
features = pd.read_table("./features.txt", sep=',', skipinitialspace=True)
features.head(n = 15)
Feature table
  1. Download the test predictions. Note that equivalent commands can also be run for downloading the train (holdout) predictions.
h2oai.download(src_path = experiment.test_predictions_path, dest_dir = ".")
'./test_preds.csv'

test_preds = pd.read_csv("./test_preds.csv")
test_preds.head()

default payment next month.1
0       0.635654
1       0.143348
2       0.065964
3       0.508118
4       0.135354

Access an Experiment Object that was Run through the Web UI

It is also possible to use the Python API to examine an experiment that was started through the Web UI using the experiment key.

You can get a pointer to the experiment by referencing the experiment key in the Web UI.

# Get a list of experiments
experiment_list = list(map(lambda x: x.key, h2oai.list_models(offset=0, limit=100)))
experiment_list
['34aa50']

# Get pointer to experiment
    experiment = h2oai.get_model_job(experiment_list[0]).entity

Score on New Data

You can use the python API to score on new data. This is equivalent to the SCORE ON ANOTHER DATASET button in the Web UI. The example below scores on the test data and then downloads the predictions.

Pass in any dataset that has the same columns as the original training set. If you passed a test set during the H2OAI model building step, the predictions already exist. Its path can be found with experiment.test_predictions_path.

prediction = h2oai.make_prediction_sync(experiment.key, test_path)
pred_path = h2oai.download(prediction.predictions_csv_path, '.')
pred_table = pd.read_csv(pred_path)
pred_table.head()

default payment next month.1
0       0.635654
1       0.143348
2       0.065964
3       0.508118
4       0.135354

Run Model Interpretation

Once we have completed an experiment, we can interpret our H2OAI model. Model Interpretability is used to provide model transparency and explanations.

We can run the model interpretation in the Python client as shown below. By setting the parameter, use_raw_features to True, we are interpreting the model using only the raw features in the data. This will not use the engineered features we saw in our final model’s features to explain the data.

By setting use_raw_features to False, we can interpret the model using the features used in the final model (raw and engineered).

h2oai.run_interpretation_sync(InterpretParameters(dai_model_key=experiment.key,
                                                dataset_key=train.key,
                                                target_col=target,
                                                use_raw_features=True,
                                                klime_cluster_col=''))

<h2oai_client.messages.Interpretation at 0x113bbc9b0>

You can also see the list of interpretations using the Python client.

# Get list of interpretations
mli_list = list(map(lambda x: x.key, h2oai.list_interpretations(offset=0, limit=100)))
mli_list

['73dd12']