Driverless AI MOJO Scoring Pipeline - C++ Runtime with Python (Supports Shapley) and R Wrappers¶

The C++ Scoring Pipeline is provided as R and Python packages for the protobuf-based MOJO2 protocol. Use your preferred method once the MOJO Scoring Pipeline has been built.

Notes:

These scoring pipelines are currently not available for RuleFit models.

Unlike the Java Runtime, TensorFlow/Bert are supported by C++ Runtime MOJO.

You can have Driverless AI attempt to reduce the size of the MOJO scoring pipeline when the experiment is being built by enabling the Reduce MOJO Size expert setting also see.

Shapley contributions come with the downloaded experiment MOJO scoring pipeline. See C++ MOJO runtime Shapley values support for scoring example.

Shapley contributions for transformed features and original features are currently available for XGBoost (GBM, GLM, RF, DART), LightGBM, Zero-Inflated, Imbalanced and DecisionTree models (and their ensemble). For ensemble with ExtraTrees meta learner (ensemble_meta_learner=’extra_trees’) models, we suggest to use the MLI Python scoring package.

To disable the automatic creation of this scoring pipeline, set the Make MOJO Scoring Pipeline expert setting to Off while building an experiment.

The Download MOJO Scoring Pipeline button appears as Build MOJO Scoring Pipeline if the MOJO Scoring Pipeline is disabled.

Prerequisites¶

In order to perform C++ MOJO scoring on natural language processing (NLP) models, installing libopenblas-dev is required. To install libopenblas-dev, run the following command:

sudo apt install libopenblas-dev

Downloads¶

This section contains download links for the C++ MOJO runtime and its Python and R wrappers.

Python:

Install directly from PyPI
- pip install daimojo
Or download and install

R:

C++ MOJO runtime

Java:

The Java wrapper for the C++ MOJO runtime is now available as part of the existing Java runtime (mojo2-runtime.jar). For more information, see H2O MOJO Java Wrapper.

Note

The Python and R packages can also be downloaded from within the Driverless AI application by clicking Resources, and then clicking MOJO Py Runtime or MOJO R Runtime from the drop-down menu.

Examples¶

The following examples show how to use the R and Python APIs of the C++ MOJO runtime.

Prerequisites

Linux OS (x86 or PPC)
Driverless AI License (either file or environment variable)
Rcpp (>=1.0.0)
data.table

Running the MOJO2 R Runtime

 # Install the R MOJO runtime using one of the methods below

 # Install the R MOJO runtime on PPC Linux
 install.packages("./daimojo_2.8.6_PPC64LE_ppc64le-redhat-linux-gnu.tar.gz")

 # Install the R MOJO runtime on x86 Linux
 install.packages("./daimojo_2.8.6_x86_64-redhat-linux-gnu.tar.gz")

 # Load the MOJO
 library(daimojo)
 m <- load.mojo("./mojo-pipeline/pipeline.mojo")

 # retrieve the creation time of the MOJO
 create.time(m)
 ## [1] "2019-11-18 22:00:24 UTC"

 # retrieve the UUID of the experiment
 uuid(m)
 ## [1] "65875c15-943a-4bc0-a162-b8984fe8e50d"

 # Load data and make predictions
 col_class <- setNames(feature.types(m), feature.names(m))  # column names and types

 library(data.table)
 d <- fread("./mojo-pipeline/example.csv", colClasses=col_class, header=TRUE, sep=",")

 predict(m, d)
 ##       label.B    label.M
 ## 1  0.08287659 0.91712341
 ## 2  0.77655075 0.22344925
 ## 3  0.58438434 0.41561566
 ## 4  0.10570505 0.89429495
 ## 5  0.01685609 0.98314391
 ## 6  0.23656610 0.76343390
 ## 7  0.17410333 0.82589667
 ## 8  0.10157948 0.89842052
 ## 9  0.13546191 0.86453809
 ## 10 0.94778244 0.05221756

Prerequisites

Linux OS (x86 or PPC)
Driverless AI License (either file or environment variable)
Python 3.6 (Note: daimojo runtimes for Python 3.7 and 3.8 are also available. For download links, see Downloads.)

datatable. Run the following to install:

# Install on Linux PPC or Linux x86
pip install datatable

Non-binary version of protobuf:

pip install --no-binary=protobuf protobuf

Python MOJO runtime. Run one of the following commands after downloading from the GUI:

# Install the MOJO runtime on Linux PPC
pip install daimojo-2.8.6_PPC64LE-cp311-cp311-linux_ppc64le.whl

# Install the MOJO runtime on Linux x86
pip install daimojo-2.8.6-cp311-cp311-linux_x86_64.whl

Running the MOJO2 Python Runtime

# import the daimojo model package
import daimojo.model

# specify the location of the MOJO
m = daimojo.model("./mojo-pipeline/pipeline.mojo")

# retrieve the creation time of the MOJO
m.created_time
# 'Mon November 18 14:00:24 2019'

# retrieve the UUID of the experiment
m.uuid

# retrieve a list of missing values
m.missing_values
# ['',
#  '?',
#  'None',
#  'nan',
#  'NA',
#  'N/A',
#  'unknown',
#  'inf',
#  '-inf',
#  '1.7976931348623157e+308',
#  '-1.7976931348623157e+308']

# retrieve the feature names
m.feature_names
# ['clump_thickness',
#  'uniformity_cell_size',
#  'uniformity_cell_shape',
#  'marginal_adhesion',
#  'single_epithelial_cell_size',
#  'bare_nuclei',
#  'bland_chromatin',
#  'normal_nucleoli',
#  'mitoses']

# retrieve the feature types
m.feature_types
# ['float32',
#  'float32',
#  'float32',
#  'float32',
#  'float32',
#  'float32',
#  'float32',
#  'float32',
#  'float32']

# retrieve the output names
m.output_names
# ['label.B', 'label.M']

# retrieve the output types
m.output_types
# ['float64', 'float64']

# import the datatable module
import datatable as dt

# parse the example.csv file
pydt = dt.fread("./mojo-pipeline/example.csv", na_strings=m.missing_values, header=True, sep=',')
pydt
#     clump_thickness  uniformity_cell_size  uniformity_cell_shape  marginal_adhesion  single_epithelial_cell_size  bare_nuclei  bland_chromatin  normal_nucleoli  mitoses
# 0                 8                     1                      3                 10                            6            6                9                1        1
# 1                 2                     1                      2                  2                            5            3                4                8        8
# 2                 1                     1                      1                  9                            4           10                3                5        4
# 3                 2                     6                      9                 10                            4            8                1                1        3
# 4                10                    10                      8                  1                            8            3                6                3        4
# 5                 1                     8                      4                  5                           10            1                2                5        3
# 6                 2                    10                      2                  9                            1            2                9                3        8
# 7                 2                     8                      9                  2                           10           10                3                5        4
# 8                 6                     3                      8                  5                            2            3                5                3        4
# 9                 4                     2                      2                  8                            1            2                8                9        1

# [10 rows × 9 columns]

# retrieve the column types
pydt.stypes
# (stype.float64,
#  stype.float64,
#  stype.float64,
#  stype.float64,
#  stype.float64,
#  stype.float64,
#  stype.float64,
#  stype.float64,
#  stype.float64)

# make predictions on the example.csv file
res = m.predict(pydt)

# retrieve the predictions
res
#           label.B     label.M
# 0     0.0828766       0.917123
# 1     0.776551        0.223449
# 2     0.584384        0.415616
# 3     0.105705        0.894295
# 4     0.0168561       0.983144
# 5     0.236566        0.763434
# 6     0.174103        0.825897
# 7     0.101579        0.898421
# 8     0.135462        0.864538
# 9     0.947782        0.0522176

# [10 rows × 2 columns]

# retrieve the prediction column names
res.names
#     ('label.B', 'label.M')

# retrieve the prediction column types
res.stypes
# (stype.float64, stype.float64)

# convert datatable results to common data types
# res.to_pandas()  # need pandas
# res.to_numpy()   # need numpy
res.to_list()

Get the prediction interval from the MOJO¶

To get the prediction interval from the MOJO, run using the flag --with-prediction-interval

java -Xmx5g -Dai.h2o.mojos.runtime.license.file=license.file -jar mojo2-runtime.jar --with-prediction-interval pipeline.mojo example.csv

C++ MOJO runtime Shapley values support¶

The C++ MOJO runtime and its Python wrapper support Shapley contributions for transformed features and original features. The following example demonstrates how to retrieve Shapley contributions for transformed and original features when making predictions:

import datatable as dt
import daimojo
X = dt.Frame("example.jay")
m = daimojo.model("pipeline.mojo")
m.predict(X)  # Prediction call that returns regular predictions
m.predict(X, interval=True)  # Prediction call that returns prediction intervals
m.predict(X, pred_contribs=True)  # Prediction call that returns Shapley contributions for transformed features
m.predict(X, pred_contribs=True, pred_contribs_original=True)  # Prediction call that returns Shapley contributions for original features

Note

Setting pred_contribs_original=True requires that pred_contribs is also set to True.
Presently, Shapley contributions for transformed features and original features are available for XGBoost (GBM, GLM, RF, DART), LightGBM, Zero-Inflated, Imbalanced and DecisionTree models (and their ensemble). For ensemble with ExtraTrees meta learner (ensemble_meta_learner=’extra_trees’) models we suggest to use the Python scoring packages.
In MOJOs, Shapley values for original features are approximated from the accompanying Shapley values for transformed features with the Naive Shapley (even split) method.
The Shapley fast approximation uses only one model (from the first fold) with no more than the first 50 trees. For details see fast_approx_num_trees and fast_approx_do_one_fold_one_model config.toml settings.