Time Series Analysis on a Driverless AI Model Scoring Pipeline

This example describes how to run the Python Scoring Pipeline on a time series model. This example has been tested on a Linux machine.

Download the Python Scoring Pipeline

After successfully completing an experiment in DAI, click the DOWNLOAD PYTHON SCORING PIPELINE button.

python-client

python-client

This downloads a scorer.zip file, which contains a scoring-pipeline folder.

After unzipping the scorer.zip file, run the following commands. (Note that the run_example.sh file can be found in the scoring-pipeline folder):

# to use conda package manager
export DRIVERLESS_AI_LICENSE_FILE="/path/to/license.sig"
bash run_example.sh --pm conda

The above will create a conda environment with the necessary requirements to be able to run the scoring pipeline and scores the test.csv files, which proves that the installation is successful.

Run the following to check the list of conda environments:

conda env list

An environment with following the format scoring_h2oai_experiment_xxx should be available, where xxx is the name of your experiment.

At this point, you can run the example below.

In [ ]:
import os
import pandas as pd
from sklearn.model_selection import train_test_split
from dateutil.parser import parse
import datetime
from datetime import timedelta
import numpy as np
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:90% !important; }</style>"))
pd.options.display.max_rows = 999
pd.options.display.max_columns = 999

Load datasets for scoring

For this example, we are using the walmart_before_20120205.zip and walmart_after_20120205.zip

In [ ]:
time1_path = "data/walmart_before_20120205.zip"
time2_path = "data/walmart_after_20120205.zip"

time1_pd = pd.read_csv(time1_path,parse_dates=['Date'])
time2_pd = pd.read_csv(time2_path,parse_dates=['Date'])
In [ ]:
# Import the scorer for the experiment. For example, below imports
# the scorer for experiemnt "hawolewo". Be sure to replace "hawolewo"
# with you rexperiment name.
from scoring_h2oai_experiment_hawolewo import Scorer
In [ ]:
%%capture
#Create a singleton Scorer instance.
#For optimal performance, create a Scorer instance once, and call score() or score_batch() multiple times.
scorer = Scorer()

Make predictions

In [ ]:
time2_pd["predict"] = scorer.score_batch(time2_pd)
time1_pd["predict"] = scorer.score_batch(time1_pd)

Join train and test datasets

In [ ]:
train_and_test= time1_pd.append(time2_pd,ignore_index=True)
train_and_test = train_and_test.reset_index(drop=True)

Model Evaluation

Here we look at the overall model performance in test and train. We also show the model horizon window in red to illustrate the performance when the model is generating predictions beyond the horizon. We prefer to use R-squared as the performance metric since the groups of Store and Department weekly sales are on vastly different scales.

In [ ]:
from sklearn.metrics import r2_score, mean_squared_error

def r2_rmse( g ):
    r2 = r2_score( g['Weekly_Sales'], g['predict'] )
    rmse = np.sqrt( mean_squared_error( g['Weekly_Sales'], g['predict'] ) )
    return pd.Series( dict(  r2 = r2, rmse = rmse ) )

metrics_ts = train_and_test.groupby( 'Date' ).apply( r2_rmse )
metrics_ts.index = [x for x in metrics_ts.index]

R2 Time Series Plot

This would be a useful plot to compare R2 over time between different DAI time series models each with different prediction horizons

In [ ]:
# Note: horizon_in_weeks is how many weeks the model can predict out to.
#       In this example 34 had been picked
horizon_in_weeks = 34

# Gap between train dataset and when predictions will start
# In this example 0 had been picked
num_gap_periods = 0
In [ ]:
%matplotlib inline
import matplotlib.pyplot as plt
metrics_ts['r2'].plot(figsize=(20,10), title=("Training dataset in Green. Test data with model prediction horizon in Red"))


test_window_start = str((time2_pd["Date"].min()) + datetime.timedelta(weeks=num_gap_periods))
test_window_end = str(parse(test_window_start) + datetime.timedelta(weeks=horizon_in_weeks))


plt.axvspan(time2_pd["Date"].min(), test_window_end, facecolor='r', alpha=0.1)
plt.axvspan(time1_pd["Date"].min(), test_window_start, facecolor='g', alpha=0.1)
plt.suptitle("R2 Time Series Plot",  fontsize=21, fontweight='bold')
plt.show()

Worst and Best Groups (in test perdiod)

Here we generate the best and worst groups by R2. We filter out groups that have some missing data. We only calculate R2 within the valid test horizon window.

In [ ]:
avg_count = train_and_test.groupby(["Store","Dept"]).size().mean()
print("average count: " + str(avg_count))
train_and_test_filtered = train_and_test.groupby(["Store","Dept"]).filter(lambda x: len(x) > 0.8 * avg_count)
train_and_test_filtered = train_and_test_filtered.loc[(train_and_test_filtered.Date < test_window_end) &
                                                     (train_and_test_filtered.Date > test_window_start)]
In [ ]:
grouped_r2s = train_and_test_filtered.groupby(["Store","Dept"]).apply( r2_rmse ).sort_values("r2")
In [ ]:
print(grouped_r2s.head()) # worst groups
print(grouped_r2s.tail()) # best groups

Worst and Best Groups (in train perdiod)

Here we generate the best and worst groups by R2. We filter out groups that have some missing data. We only calculate R2 within the train horizon window.

In [ ]:
avg_count = train_and_test.groupby(["Store","Dept"]).size().mean()
print("average count: " + str(avg_count))
train_and_test_filtered = train_and_test.groupby(["Store","Dept"]).filter(lambda x: len(x) > 0.8 * avg_count)
train_and_test_filtered = train_and_test_filtered.loc[(train_and_test_filtered.Date < test_window_start) &
                                                    (train_and_test_filtered.Date >= '2010-01-10')]
In [ ]:
grouped_r2s = train_and_test_filtered.groupby(["Store","Dept"]).apply( r2_rmse ).sort_values("r2")
In [ ]:
print(grouped_r2s.head()) # worst groups
print(grouped_r2s.tail()) # best groups

Choose group

In [ ]:
# Choose Group
store_num = 11
dept_num = 56
date_selection = '2012-02-10'

Plot Actual vs Predicted

In [ ]:
plot_df = train_and_test[(train_and_test.Store == store_num) & (train_and_test.Dept == dept_num)][["Date","Weekly_Sales","predict"]]
plot_df["Date"] = plot_df["Date"].apply(lambda x: (x))
plot_df = plot_df.set_index("Date")
In [ ]:
plot_df.plot(figsize=(20,10), title=("Training dataset in Green. Test data with model prediction horizon in Red. Date selection in Yellow"))

plt.axvspan(time2_pd["Date"].min(), test_window_end, facecolor='r', alpha=0.1)
plt.axvspan(time1_pd["Date"].min(), test_window_start, facecolor='g', alpha=0.1)
plt.axvline(x=date_selection, color='y')
plt.suptitle("Actual vs. Predicted", fontsize=21, fontweight='bold')
plt.show()

Plot Actual vs Predicted vs Deploy

In [ ]:
deploy_path = "data/walmart_deploy.zip"

deploy=pd.read_csv(deploy_path,parse_dates=['Date'])
deploy=deploy[pd.to_datetime(deploy['Date'])<=((time2_pd.Date.max()) + timedelta(days=7*horizon_in_weeks)  )].reset_index(drop=True).copy()
#deploy.loc[deploy['Weekly_Sales'].isna(),'Weekly_Sales']=0

deploy=train_and_test.append(deploy,ignore_index=True).reset_index(drop=True)
deploy=deploy[['Store','Dept','Weekly_Sales','Date','IsHoliday','Hl', 'Size', 'ThanksG', 'Type', 'Unemployment', 'Xmas']].copy()
In [ ]:
deploy["predict"] = scorer.score_batch(deploy)
#deploy2["predict"] = scorer.score_batch(deploy2)
In [ ]:
#deploy= deploy.append(test,ignore_index=True)
#deploy = deploy.reset_index(drop=True)
plot_df2 = deploy[(deploy.Store == store_num) &
                                (deploy.Dept == dept_num)][["Date","Weekly_Sales","predict"]]
plot_df2["Date"] = plot_df2["Date"].apply(lambda x: (x))
plot_df2 = plot_df2.set_index("Date")
In [ ]:
plot_df2.plot(figsize=(20,10), title=("Training dataset in Green. Test data with model prediction horizon in Red. Date selection in Yellow"))

plt.axvspan(time2_pd["Date"].min(), test_window_end, facecolor='r', alpha=0.1)
plt.axvspan(time1_pd["Date"].min(), test_window_start, facecolor='g', alpha=0.1)
plt.axvline(x=date_selection, color='y')
plt.suptitle("Actual vs. Predicted vs. Deploy", fontsize=21, fontweight='bold')
plt.show()

Make Shapley Values

Shapley values show the contribution of engineered features to the predicted weekly sales generated by the model. Will Shapley values you can break down the components of a prediction and attribute precise values to specfic features. Please note, in some cases the model has a “link function” that yet to be applied to make the sum of the Shapley contributions equal to the prediction value.

In [ ]:
shapley = scorer.score_batch(train_and_test, pred_contribs=True, fast_approx=True)
shapley.columns = [x.replace('contrib_','',1) for x in shapley.columns]

Plot Shapley

This is a global vs local Shapley plot, with global being the average Shapley values for all of the predictions in the selected group and local being the Shapley value for that specific prediction. Looking at this plot can give clues as to which features contributed to the error in the prediction.

In [ ]:
shap_vals_group = shapley.loc[(train_and_test.Store==store_num) & (train_and_test.Dept==dept_num),:]
shap_vals_timestamp = shapley.loc[(train_and_test.Store==store_num)
                                  & (train_and_test.Dept==dept_num)
                                  & (train_and_test.Date==date_selection),:]
shap_vals = shap_vals_group.mean()
shap_vals = pd.concat([pd.DataFrame(shap_vals), shap_vals_timestamp.transpose()], axis=1, ignore_index=True)
shap_vals = shap_vals.sort_values(by=0)
bias = shap_vals.loc["bias",0]
shap_vals = shap_vals.drop("bias",axis=0)
shap_vals.columns = ["Global (Group)", "Local (Timestamp)"]
In [ ]:
shap_vals_group = shapley.loc[(train_and_test.Store==store_num) & (train_and_test.Dept==dept_num),:]
shap_vals_timestamp = shapley.loc[(train_and_test.Store==store_num)
                                  & (train_and_test.Dept==dept_num)
                                  & (train_and_test.Date==date_selection),:]
shap_vals = shap_vals_group.mean()
shap_vals = pd.concat([pd.DataFrame(shap_vals), shap_vals_timestamp.transpose()], axis=1, ignore_index=True)
shap_vals = shap_vals.sort_values(by=0)
bias = shap_vals.loc["bias",0]
shap_vals = shap_vals.drop("bias",axis=0)
shap_vals.columns = ["Global (Group)", "Local (Timestamp)"]


shap_vals_timestamp.transpose()
In [ ]:
from matplotlib.ticker import FuncFormatter
formatter = FuncFormatter(lambda x, y:  str(round(float(x) + bias)))
ax = shap_vals.plot.barh(figsize=(8,30), fontsize=10, colormap="Set1")
ax.xaxis.set_major_formatter(formatter)
plt.show()

Summary

This notebook should get you started with all you need to diagnose and debug time series models from DAI. Try different horizons during training and compare the model’s R2 over time to pick the best horizon for your use case. Use the actual vs prediction plots to do detailed debugging. Find some interesting dates to examine and use the Shapley plots to see how the features impacted the final prediction.