Appendix B: Using the Driverless AI Python Client¶
This section describes how to run Driverless AI using the Python client.
This is an early release of the Driverless AI Python client.
Python 3.6 is the only supported version.
You must install the h2oai_client wheel to your local Python. This is available from the PY_CLIENT link in the top menu of the UI.
Running an Experiment¶
- After the h2oai_client wheel is installed, import the required modules and log in.
import h2oai_client import numpy as np import pandas as pd import requests import math from h2oai_client import Client, ModelParameters address = 'http://ip_where_driverless_is_running:12345' username = 'username' password = 'password' h2oai = Client(address = address, username = username, password = password) # Be sure to use the same credentials that you use when signing in through the GUI
- Upload training and testing datasets from the Driverless AI /data folder.
train_path = '/data/CreditCard/CreditCard-train.csv' test_path = '/data/CreditCard/CreditCard-test.csv' train = h2oai.create_dataset_sync(train_path) test = h2oai.create_dataset_sync(test_path)
Set the Target, Dropped, Weight, Fold, and Time columns:
- Target Column: The column we are trying to predict.
- Dropped Columns: The columns we do not want to use as predictors such as ID columns, columns with data leakage, etc.
- Weight Column: The column that indicates the per row observation weights. If “None”, each row will have an observation weight of 1.
- Fold Column: The column that indicates the fold. If “None”, the folds will be determined by Driverless AI.
- Time Column: The column that provides a time order, if applicable. If “AUTO”, Driverless AI will auto-detect a potential time order. If “OFF”, auto-detection is disabled.
For this example, we will be predicting default payment next month and we will ignore the column: ID. We do not have a Weight column, Fold column, or Time column. We will let Driverless AI auto-detect any potential time order.
# set the parameters you want to pass to the UI target = "default payment next month" drop_cols = ['ID'] weight_col = None fold_col = None time_col = '[AUTO]'
- Specify the experiment settings. Refer to the Experiment Settings for more information about these settings.
# Pre-set parameters to pass model is_classification = True enable_gpus = True seed=True scorer_str = 'auc' # Pre-sent accuracy knobs accuracy_value = 5 time_value = 5 interpretability = 5
- Launch the experiment to run feature engineering and final model training. In addition to the settings previously defined, be sure to also specify the imported training dataset. Adding a test dataset is optional.
experiment = h2oai.start_experiment_sync(ModelParameters( # Datasets dataset_key=train.key, testset_key=test.key, # Columns target_col=target, cols_to_drop=drop_cols, weight_col=weight_col, fold_col=fold_col, time_col=time_col, # Parameters is_classification=is_classification, enable_gpus=enable_gpus, seed=seed, accuracy=accuracy_value, time=time_value, interpretability=interpretability, scorer=scorer_str ))
- View the final model score for the train and test datasets. When feature engineering is complete, an ensemble model can be built depending on the accuracy setting. The experiment object also contains the score on the train and test data for this ensemble model.
print("Final Model Score on Train Data: " + str(round(experiment.train_score, 3))) print("Final Model Score on Test Data: " + str(round(experiment.test_score, 3))) Final Model Score on Train Data: 0.781 Final Model Score on Test Data: 0.801
- You can also download and examine the features and their importance for the final model. The table below shows the feature name, it’s relative importance, and a description. You can see that some features were engineered by Driverless AI and some were the original.
h2oai.download(src_path=experiment.features_path, dest_dir=".") features = pd.read_table("./features.txt", sep=',', skipinitialspace=True) features.head(n = 15)
- Download the test predictions. Note that equivalent commands can also be run for downloading the train (holdout) predictions.
h2oai.download(src_path = experiment.test_predictions_path, dest_dir = ".") './test_preds.csv' test_preds = pd.read_csv("./test_preds.csv") test_preds.head() default payment next month.1 0 0.635654 1 0.143348 2 0.065964 3 0.508118 4 0.135354
Access an Experiment Object that was Run through the Web UI¶
It is also possible to use the Python API to examine an experiment that was started through the Web UI using the experiment key.
You can get a pointer to the experiment by referencing the experiment key in the Web UI.
# Get a list of experiments experiment_list = list(map(lambda x: x.key, h2oai.list_models(offset=0, limit=100))) experiment_list ['34aa50'] # Get pointer to experiment experiment = h2oai.get_model_job(experiment_list).entity
Score on New Data¶
You can use the python API to score on new data. This is equivalent to the SCORE ON ANOTHER DATASET button in the Web UI. The example below scores on the test data and then downloads the predictions.
Pass in any dataset that has the same columns as the original training set. If you passed a test set during the H2OAI model building step, the predictions already exist. Its path can be found with
prediction = h2oai.make_prediction_sync(experiment.key, test_path) pred_path = h2oai.download(prediction.predictions_csv_path, '.') pred_table = pd.read_csv(pred_path) pred_table.head() default payment next month.1 0 0.635654 1 0.143348 2 0.065964 3 0.508118 4 0.135354
Run Model Interpretation¶
Once we have completed an experiment, we can interpret our H2OAI model. Model Interpretability is used to provide model transparency and explanations.
We can run the model interpretation in the Python client as shown below. By setting the parameter,
use_raw_features to True, we are interpreting the model using only the raw features in the data. This will not use the engineered features we saw in our final model’s features to explain the data.
use_raw_features to False, we can interpret the model using the features used in the final model (raw and engineered).
h2oai.run_interpretation_sync(InterpretParameters(dai_model_key=experiment.key, dataset_key=train.key, target_col=target, use_raw_features=True, klime_cluster_col='')) <h2oai_client.messages.Interpretation at 0x113bbc9b0>
You can also see the list of interpretations using the Python client.
# Get list of interpretations mli_list = list(map(lambda x: x.key, h2oai.list_interpretations(offset=0, limit=100))) mli_list ['73dd12']