The Python and MOJO Scoring Pipelines

Driverless AI provides a Python Scoring Pipeline for experiments and interpreted models and a MOJO (Java-based) Scoring Pipeline for experiments.

The Python Scoring Pipeline is implemented as a Python whl file. While this allows for a single process scoring engine, the scoring service is generally implemented as a client/server architecture and supports interfaces for TCP and HTTP.

The MOJO Scoring Pipeline provides a standalone scoring pipeline that converts experiments to MOJOs, which can be scored in real time.

Examples are included with each scoring package.

Which Pipeline Should I Use?

Driverless AI provides a Python Scoring Pipeline, an MLI Standalone Scoring Pipeline, and a MOJO Scoring Pipeline. Consider the following when determining the scoring pipeline that you want to use.

  • For all pipelines, the higher the accuracy, the slower the scoring.
  • The Python Scoring Pipeline is slower but easier to use than the MOJO scoring pipeline.
  • When running the Python Scoring Pipeline:
    • HTTP is easy and is supported by virtually any language. HTTP supports RESTful calls via curl, wget, or supported packages in various scripting languages.
    • TCP is a bit more complex, though faster. TCP also requires Thrift, which currently does not handle NAs.
  • Use the MOJO Scoring Pipeline for a pure Java solution. This solution is flexible and is faster than the Python Scoring Pipeline, but it requires a bit more coding.
  • The MLI Standalone Python Scoring Pipeline can be used to score interpreted models but only supports k-LIME reason codes.
    • For obtaining k-LIME reason codes from an MLI experiment, use the MLI Standalone Python Scoring Pipeline. k-LIME reason codes are available for all models.
    • For obtaining Shapley reason codes from an MLI experiment, use the DAI Standalone Python Scoring Pipeline. Shapley is only available for XGBoost and LightGBM models. Note that obtaining Shapley reason codes through the Python Scoring Pipeline can be time consuming.

Driverless AI Standalone Python Scoring Pipeline

As indicated earlier, a scoring pipeline is available after a successfully completed experiment. This package contains an exported model and Python 3.6 source code examples for productionizing models built using H2O Driverless AI.

The files in this package allow you to transform and score on new data in a couple of different ways:

  • From Python 3.6, you can import a scoring module, and then use the module to transform and score on new data.
  • From other languages and platforms, you can use the TCP/HTTP scoring service bundled with this package to call into the scoring pipeline module through remote procedure calls (RPC).

Python Scoring Pipeline Files

The scoring-pipeline folder includes the following notable files:

  • example.py: An example Python script demonstrating how to import and score new records.
  • run_example.sh: Runs example.py (also sets up a virtualenv with prerequisite libraries).
  • tcp_server.py: A standalone TCP server for hosting scoring services.
  • http_server.py: A standalone HTTP server for hosting scoring services.
  • run_tcp_server.sh: Runs TCP scoring service (runs tcp_server.py).
  • run_http_server.sh: Runs HTTP scoring service (runs http_server.py).
  • example_client.py: An example Python script demonstrating how to communicate with the scoring server.
  • run_tcp_client.sh: Demonstrates how to communicate with the scoring service via TCP (runs example_client.py).
  • run_http_client.sh: Demonstrates how to communicate with the scoring service via HTTP (using curl).

Prerequisites

The following are required in order to run the scoring pipeline.

  • The scoring module and scoring service are supported only on Linux with Python 3.6 and OpenBLAS.
  • The scoring module and scoring service download additional packages at install time and require Internet access. Depending on your network environment, you might need to set up internet access via a proxy.
  • Valid Driverless AI license. Driverless AI requires a license to be specified in order to run the Python Scoring Pipeline.
  • Apache Thrift (to run the scoring service in TCP mode)
  • Linux environment
  • Python 3.6
  • libopenblas-dev (required for H2O4GPU)
  • OpenCL
  • Internet access to download and install packages. Note that depending on your environment, you may also need to set up proxy.

Examples of how to install these prerequisites are below.

Installing Python 3.6

Installing Python 3.6 and OpenBLAS on Ubuntu 16.10+

sudo apt install python3.6 python3.6-dev python3-pip python3-dev \
  python-virtualenv python3-virtualenv libopenblas-dev

Installing Python 3.6 and OpenBLAS on Ubuntu 16.04

sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt-get update
sudo apt-get install python3.6 python3.6-dev python3-pip python3-dev \
  python-virtualenv python3-virtualenv libopenblas-dev

Installing Conda 3.6:

You can install Conda using either Anaconda or Miniconda. Refer to the links below for more information:

Installing OpenCL

Install OpenCL on RHEL

yum -y clean all
yum -y makecache
yum -y update
wget http://dl.fedoraproject.org/pub/epel/7/x86_64/Packages/c/clinfo-2.1.17.02.09-1.el7.x86_64.rpm
wget http://dl.fedoraproject.org/pub/epel/7/x86_64/Packages/o/ocl-icd-2.2.12-1.el7.x86_64.rpm
rpm -if clinfo-2.1.17.02.09-1.el7.x86_64.rpm
rpm -if ocl-icd-2.2.12-1.el7.x86_64.rpm
clinfo

mkdir -p /etc/OpenCL/vendors && \
    echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd

Install OpenCL on Ubuntu

sudo apt-get install opencl-headers clinfo ocl-icd-opencl-dev

mkdir -p /etc/OpenCL/vendors && \
    echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd

License Specification

Driverless AI requires a license to be specified in order to run the Python Scoring Pipeline. The license can be specified via an environment variable in Python:

# Set DRIVERLESS_AI_LICENSE_FILE, the path to the Driverless AI license file
%env DRIVERLESS_AI_LICENSE_FILE="/home/ubuntu/license/license.sig"


# Set DRIVERLESS_AI_LICENSE_KEY, the Driverless AI license key (Base64 encoded string)
%env DRIVERLESS_AI_LICENSE_KEY="oLqLZXMI0y..."

The examples that follow use DRIVERLESS_AI_LICENSE_FILE. Using DRIVERLESS_AI_LICENSE_KEY would be similar.

Installing the Thrift Compiler

Thrift is required to run the scoring service in TCP mode, but it is not required to run the scoring module. The following steps are available on the Thrift documentation site at: https://thrift.apache.org/docs/BuildingFromSource.

sudo apt-get install automake bison flex g++ git libevent-dev \
  libssl-dev libtool make pkg-config libboost-all-dev ant
wget https://github.com/apache/thrift/archive/0.10.0.tar.gz
tar -xvf 0.10.0.tar.gz
cd thrift-0.10.0
./bootstrap.sh
./configure
make
sudo make install

Run the following to refresh the runtime shared after installing Thrift:

sudo ldconfig /usr/local/lib

Quickstart

Before running the quickstart examples, be sure that the scoring pipeline is already downloaded and unzipped:

  1. On the completed Experiment page, click on the Download Python Scoring Pipeline button to download the scorer.zip file for this experiment onto your local machine.
Download Python Scoring Pipeline button
  1. Unzip the scoring pipeline.

After the pipeline is downloaded and unzipped, you will be able to run the scoring module and the scoring service.

Score from a Python Program

If you intend to score from a Python program, run the scoring module example. (Requires Linux and Python 3.6.)

export DRIVERLESS_AI_LICENSE_FILE="/path/to/license.sig"
bash run_example.sh

Score Using a Web Service

If you intend to score using a web service, run the HTTP scoring server example. (Requires Linux x86_64 and Python 3.6.)

export DRIVERLESS_AI_LICENSE_FILE="/path/to/license.sig"
bash run_http_server.sh
bash run_http_client.sh

Score Using a Thrift Service

If you intend to score using a Thrift service, run the TCP scoring server example. (Requires Linux x86_64, Python 3.6 and Thrift.)

export DRIVERLESS_AI_LICENSE_FILE="/path/to/license.sig"
bash run_tcp_server.sh
bash run_tcp_client.sh

Note: By default, the run_*.sh scripts mentioned above create a virtual environment using virtualenv and pip, within which the Python code is executed. The scripts can also leverage Conda (Anaconda/Mininconda) to create Conda virtual environment and install required package dependencies. The package manager to use is provided as an argument to the script.

# to use conda package manager
export DRIVERLESS_AI_LICENSE_FILE="/path/to/license.sig"
bash run_example.sh --pm conda

# to use pip package manager
export DRIVERLESS_AI_LICENSE_FILE="/path/to/license.sig"
bash run_example.sh --pm pip

If you experience errors while running any of the above scripts, please check to make sure your system has a properly installed and configured Python 3.6 installation. Refer to the Troubleshooting Python Environment Issues section that follows to see how to set up and test the scoring module using a cleanroom Ubuntu 16.04 virtual machine.

The Python Scoring Module

The scoring module is a Python module bundled into a standalone wheel file (name scoring_*.whl). All the prerequisites for the scoring module to work correctly are listed in the requirements.txt file. To use the scoring module, all you have to do is create a Python virtualenv, install the prerequisites, and then import and use the scoring module as follows:

# See 'example.py' for complete example.
from scoring_487931_20170921174120_b4066 import Scorer
scorer = Scorer()       # Create instance.
score = scorer.score([  # Call score()
    7.416,              # sepal_len
    3.562,              # sepal_wid
    1.049,              # petal_len
    2.388,              # petal_wid
])

The scorer instance provides the following methods (and more):

  • score(list): Score one row (list of values).
  • score_batch(df): Score a Pandas dataframe.
  • fit_transform_batch(df): Transform a Pandas dataframe.
  • get_target_labels(): Get target column labels (for classification problems).

The process of importing and using the scoring module is demonstrated by the bash script run_example.sh, which effectively performs the following steps:

# See 'run_example.sh' for complete example.
virtualenv -p python3.6 env
source env/bin/activate
pip install -r requirements.txt
export DRIVERLESS_AI_LICENSE_FILE="/path/to/license.sig"
python example.py

The Scoring Service

The scoring service hosts the scoring module as an HTTP or TCP service. Doing this exposes all the functions of the scoring module through remote procedure calls (RPC). In effect, this mechanism allows you to invoke scoring functions from languages other than Python on the same computer or from another computer on a shared network or on the Internet.

The scoring service can be started in two ways:

  • In TCP mode, the scoring service provides high-performance RPC calls via Apache Thrift (https://thrift.apache.org/) using a binary wire protocol.
  • In HTTP mode, the scoring service provides JSON-RPC 2.0 calls served by Tornado (http://www.tornadoweb.org).

Scoring operations can be performed on individual rows (row-by-row) or in batch mode (multiple rows at a time).

Scoring Service - TCP Mode (Thrift)

The TCP mode allows you to use the scoring service from any language supported by Thrift, including C, C++, C#, Cocoa, D, Dart, Delphi, Go, Haxe, Java, Node.js, Lua, perl, PHP, Python, Ruby and Smalltalk.

To start the scoring service in TCP mode, you will need to generate the Thrift bindings once, then run the server:

# See 'run_tcp_server.sh' for complete example.
thrift --gen py scoring.thrift
python tcp_server.py --port=9090

Note that the Thrift compiler is only required at build-time. It is not a run time dependency, i.e. once the scoring services are built and tested, you do not need to repeat this installation process on the machines where the scoring services are intended to be deployed.

To call the scoring service, simply generate the Thrift bindings for your language of choice, then make RPC calls via TCP sockets using Thrift’s buffered transport in conjunction with its binary protocol.

# See 'run_tcp_client.sh' for complete example.
thrift --gen py scoring.thrift

# See 'example_client.py' for complete example.
socket = TSocket.TSocket('localhost', 9090)
transport = TTransport.TBufferedTransport(socket)
protocol = TBinaryProtocol.TBinaryProtocol(transport)
client = ScoringService.Client(protocol)
transport.open()
row = Row()
row.sepalLen = 7.416  # sepal_len
row.sepalWid = 3.562  # sepal_wid
row.petalLen = 1.049  # petal_len
row.petalWid = 2.388  # petal_wid
scores = client.score(row)
transport.close()

You can reproduce the exact same result from other languages, e.g. Java:

thrift --gen java scoring.thrift

// Dependencies:
// commons-codec-1.9.jar
// commons-logging-1.2.jar
// httpclient-4.4.1.jar
// httpcore-4.4.1.jar
// libthrift-0.10.0.jar
// slf4j-api-1.7.12.jar

import ai.h2o.scoring.Row;
import ai.h2o.scoring.ScoringService;
import org.apache.thrift.TException;
import org.apache.thrift.protocol.TBinaryProtocol;
import org.apache.thrift.transport.TSocket;
import org.apache.thrift.transport.TTransport;
import java.util.List;

public class Main {
  public static void main(String[] args) {
    try {
      TTransport transport = new TSocket("localhost", 9090);
      transport.open();

      ScoringService.Client client = new ScoringService.Client(
        new TBinaryProtocol(transport));

      Row row = new Row(7.642, 3.436, 6.721, 1.020);
      List<Double> scores = client.score(row);
      System.out.println(scores);

      transport.close();
    } catch (TException ex) {
      ex.printStackTrace();
    }
  }
}

Scoring Service - HTTP Mode (JSON-RPC 2.0)

The HTTP mode allows you to use the scoring service using plaintext JSON-RPC calls. This is usually less performant compared to Thrift, but has the advantage of being usable from any HTTP client library in your language of choice, without any dependency on Thrift.

For JSON-RPC documentation, see http://www.jsonrpc.org/specification.

To start the scoring service in HTTP mode:

# See 'run_http_server.sh' for complete example.
export DRIVERLESS_AI_LICENSE_FILE="/path/to/license.sig"
python http_server.py --port=9090

To invoke scoring methods, compose a JSON-RPC message and make a HTTP POST request to http://host:port/rpc as follows:

# See 'run_http_client.sh' for complete example.
curl http://localhost:9090/rpc \
  --header "Content-Type: application/json" \
  --data @- <<EOF
 {
  "id": 1,
  "method": "score",
  "params": {
    "row": [ 7.486, 3.277, 4.755, 2.354 ]
  }
 }
EOF

Similarly, you can use any HTTP client library to reproduce the above result. For example, from Python, you can use the requests module as follows:

import requests
row = [7.486, 3.277, 4.755, 2.354]
req = dict(id=1, method='score', params=dict(row=row))
res = requests.post('http://localhost:9090/rpc', data=req)
print(res.json()['result'])

Python Scoring Pipeline FAQ

Why am I getting a “TensorFlow is disabled” message when I run the Python Scoring Pipeline?

If you ran an experiment when TensorFlow was enabled and then attempt to run the Python Scoring Pipeline, you may receive a message similar to the following:

TensorFlow is disabled. To enable, export DRIVERLESS_AI_ENABLE_TENSORFLOW=1 or set enable_tensorflow=true in config.toml.

To successfully run the Python Scoring Pipeline, you must enable the DRIVERLESS_AI_ENABLE_TENSORFLOW flag. For example:

export DRIVERLESS_AI_LICENSE_FILE="/path/to/license.sig"
DRIVERLESS_AI_ENABLE_TENSORFLOW=1 bash run_example.sh

Troubleshooting Python Environment Issues

The following instructions describe how to set up a cleanroom Ubuntu 16.04 virtual machine to test that this scoring pipeline works correctly.

Prerequisites:

  1. Create configuration files for Vagrant.
    • bootstrap.sh: contains commands to set up Python 3.6 and OpenBLAS.
    • Vagrantfile: contains virtual machine configuration instructions for Vagrant and VirtualBox.
----- bootstrap.sh -----

#!/usr/bin/env bash

sudo apt-get -y update
sudo apt-get -y install apt-utils build-essential python-software-properties software-properties-common zip libopenblas-dev
sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt-get update -yqq
sudo apt-get install -y python3.6 python3.6-dev python3-pip python3-dev python-virtualenv python3-virtualenv

# end of bootstrap.sh

----- Vagrantfile -----

# -*- mode: ruby -*-
# vi: set ft=ruby :

Vagrant.configure(2) do |config|
  config.vm.box = "ubuntu/xenial64"
  config.vm.provision :shell, path: "bootstrap.sh", privileged: false
  config.vm.hostname = "h2o"
  config.vm.provider "virtualbox" do |vb|
    vb.memory = "4096"
  end
end

# end of Vagrantfile
  1. Launch the VM and SSH into it. Note that we’re also placing the scoring pipeline in the same directory so that we can access it later inside the VM.
cp /path/to/scorer.zip .
vagrant up
vagrant ssh
  1. Test the scoring pipeline inside the virtual machine.
cp /vagrant/scorer.zip .
unzip scorer.zip
cd scoring-pipeline/
export DRIVERLESS_AI_LICENSE_FILE="/path/to/license.sig"
bash run_example.sh

At this point, you should see scores printed out on the terminal. If not, contact us at support@h2o.ai.

Using the Standalone Python Scoring Pipeline in a Different Docker Container

The Standalone Python Scoring Pipeline runs inside of the Driverless AI Docker container. This is the recommended method for running the Python Scoring Pipeline. If necessary, though, this pipeline can also be run inside of a different Docker container. The following steps describe how to do this. This setup assumes that you have a valid Driverless AI license key, which will be required during setup. It also assumes that you have completed a Driverless AI experiment and downloaded the Scoring Pipeline.

  1. On the machine where you want to run the Python Scoring Pipeline, create a new directory for Driverless AI (for example, dai-nnn.)
mkdir dai-nnn
  1. Download the TAR SH version of Driverless AI from https://www.h2o.ai/download/ (for either Linux or IBM Power).
  2. Use bash to execute the download and unpack it into the new Driverless AI folder.
  3. Change directories into the new Driverless AI folder.
cd dai-nnn directory.
  1. Run the following to install the Python Scoring Pipeline for your completed Driverless AI experiment:
./dai-env.sh pip install /path/to/your/scoring_experiment.whl
  1. Run the following command to run the included scoring pipeline example:
DRIVERLESS_AI_LICENSE_KEY="pastekeyhere" SCORING_PIPELINE_INSTALL_DEPENDENCIES=0 ./dai-env.sh /path/to/your/run_example.sh

Driverless AI MLI Standalone Python Scoring Package

This package contains an exported model and Python 3.6 source code examples for productionizing models built using H2O Driverless AI Machine Learning Interpretability (MLI) tool. This is only available for interpreted models and can be downloaded by clicking the Scoring Pipeline button the Interpreted Models page.

The files in this package allow you to obtain reason codes for a given row of data a couple of different ways:

  • From Python 3.6, you can import a scoring module, and then use the module to transform and score on new data.
  • From other languages and platforms, you can use the TCP/HTTP scoring service bundled with this package to call into the scoring pipeline module through remote procedure calls (RPC).

MLI Python Scoring Package Files

The scoring-pipeline-mli folder includes the following notable files:

  • example.py: An example Python script demonstrating how to import and interpret new records.
  • run_example.sh: Runs example.py (This also sets up a virtualenv with prerequisite libraries.)
  • run_example_shapley.sh: Runs example_shapley.py. This compares K-LIME and Driverless AI Shapley reason codes.
  • tcp_server.py: A standalone TCP server for hosting MLI services.
  • http_server.py: A standalone HTTP server for hosting MLI services.
  • run_tcp_server.sh: Runs the TCP scoring service (specifically, tcp_server.py).
  • run_http_server.sh: Runs HTTP scoring service (runs http_server.py).
  • example_client.py: An example Python script demonstrating how to communicate with the MLI server.
  • run_tcp_client.sh: Demonstrates how to communicate with the MLI service via TCP (runs example_client.py).
  • run_http_client.sh: Demonstrates how to communicate with the MLI service via HTTP (using curl).

Prerequisites

  • The scoring module and scoring service are supported only on Linux with Python 3.6 and OpenBLAS.
  • The scoring module and scoring service download additional packages at install time and require internet access. Depending on your network environment, you might need to set up internet access via a proxy.
  • Apache Thrift (to run the scoring service in TCP mode)

Examples of how to install these prerequisites are below.

Installing Python 3.6

Installing Python3.6 on Ubuntu 16.10+:

sudo apt install python3.6 python3.6-dev python3-pip python3-dev \
  python-virtualenv python3-virtualenv

Installing Python3.6 on Ubuntu 16.04:

sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt-get update
sudo apt-get install python3.6 python3.6-dev python3-pip python3-dev \
  python-virtualenv python3-virtualenv

Installing Conda 3.6:

You can install Conda using either Anaconda or Miniconda. Refer to the links below for more information:

Installing the Thrift Compiler

Refer to Thrift documentation at https://thrift.apache.org/docs/BuildingFromSource for more information.

sudo apt-get install automake bison flex g++ git libevent-dev \
  libssl-dev libtool make pkg-config libboost-all-dev ant
wget https://github.com/apache/thrift/archive/0.10.0.tar.gz
tar -xvf 0.10.0.tar.gz
cd thrift-0.10.0
./bootstrap.sh
./configure
make
sudo make install

Run the following to refresh the runtime shared after installing Thrift.

sudo ldconfig /usr/local/lib

Quickstart

Before running the quickstart examples, be sure that the MLI Scoring Package is already downloaded and unzipped.

  1. On the MLI page, click the Scoring Pipeline button.
Scoring Pipeline - MLI
  1. Unzip the scoring pipeline, and run the following examples in the scoring-pipeline-mli folder.

Run the scoring module example. (This requires Linux and Python 3.6.)

bash run_example.sh

Run the TCP scoring server example. Use two terminal windows. (This requires Linux, Python 3.6 and Thrift.)

bash run_tcp_server.sh
bash run_tcp_client.sh

Run the HTTP scoring server example. Use two terminal windows. (This requires Linux, Python 3.6 and Thrift.)

bash run_http_server.sh
bash run_http_client.sh

Note: By default, the run_*.sh scripts mentioned above create a virtual environment using virtualenv and pip, within which the Python code is executed. The scripts can also leverage Conda (Anaconda/Mininconda) to create Conda virtual environment and install required package dependencies. The package manager to use is provided as an argument to the script.

# to use conda package manager
bash run_example.sh --pm conda

# to use pip package manager
bash run_example.sh --pm pip

MLI Python Scoring Module

The MLI scoring module is a Python module bundled into a standalone wheel file (name scoring_*.whl). All the prerequisites for the scoring module to work correctly are listed in the ‘requirements.txt’ file. To use the scoring module, all you have to do is create a Python virtualenv, install the prerequisites, and then import and use the scoring module as follows:

----- See 'example.py' for complete example. -----
from scoring_487931_20170921174120_b4066 import Scorer
scorer = KLimeScorer()       # Create instance.
score = scorer.score_reason_codes([  # Call score_reason_codes()
    7.416,              # sepal_len
    3.562,              # sepal_wid
    1.049,              # petal_len
    2.388,              # petal_wid
])

The scorer instance provides the following methods:

  • score_reason_codes(list): Get K-LIME reason codes for one row (list of values).
  • score_reason_codes_batch(dataframe): Takes and outputs a Pandas Dataframe
  • get_column_names(): Get the input column names
  • get_reason_code_column_names(): Get the output column names

The process of importing and using the scoring module is demonstrated by the bash script run_example.sh, which effectively performs the following steps:

----- See 'run_example.sh' for complete example. -----
virtualenv -p python3.6 env
source env/bin/activate
pip install -r requirements.txt
python example.py

K-LIME vs Shapley Reason Codes

There are times when the K-LIME model score is not close to the Driverless AI model score. In this case it may be better to use reason codes using the Shapley method on the Driverless AI model. Please note: the reason codes from Shapley will be in the transformed feature space.

To see an example of using both K-LIME and Driverless AI Shapley reason codes in the same Python session, run:

bash run_example_shapley.sh

For this batch script to succeed, MLI must be run on a Driverless AI model. If you have run MLI in standalone (external model) mode, there will not be a Driverless AI scoring pipeline.

If MLI was run with transformed features, the Shapley example scripts will not be exported. You can generate exact reason codes directly from the Driverless AI model scoring pipeline.

MLI Scoring Service Overview

The MLI scoring service hosts the scoring module as a HTTP or TCP service. Doing this exposes all the functions of the scoring module through remote procedure calls (RPC).

In effect, this mechanism allows you to invoke scoring functions from languages other than Python on the same computer, or from another computer on a shared network or the internet.

The scoring service can be started in two ways:

  • In TCP mode, the scoring service provides high-performance RPC calls via Apache Thrift (https://thrift.apache.org/) using a binary wire protocol.
  • In HTTP mode, the scoring service provides JSON-RPC 2.0 calls served by Tornado (http://www.tornadoweb.org).

Scoring operations can be performed on individual rows (row-by-row) or in batch mode (multiple rows at a time).

MLI Scoring Service - TCP Mode (Thrift)

The TCP mode allows you to use the scoring service from any language supported by Thrift, including C, C++, C#, Cocoa, D, Dart, Delphi, Go, Haxe, Java, Node.js, Lua, perl, PHP, Python, Ruby and Smalltalk.

To start the scoring service in TCP mode, you will need to generate the Thrift bindings once, then run the server:

----- See 'run_tcp_server.sh' for complete example. -----
thrift --gen py scoring.thrift
python tcp_server.py --port=9090

Note that the Thrift compiler is only required at build-time. It is not a run time dependency, i.e. once the scoring services are built and tested, you do not need to repeat this installation process on the machines where the scoring services are intended to be deployed.

To call the scoring service, simply generate the Thrift bindings for your language of choice, then make RPC calls via TCP sockets using Thrift’s buffered transport in conjunction with its binary protocol.

----- See 'run_tcp_client.sh' for complete example. -----
thrift --gen py scoring.thrift


----- See 'example_client.py' for complete example. -----
socket = TSocket.TSocket('localhost', 9090)
transport = TTransport.TBufferedTransport(socket)
protocol = TBinaryProtocol.TBinaryProtocol(transport)
client = ScoringService.Client(protocol)
transport.open()
row = Row()
row.sepalLen = 7.416  # sepal_len
row.sepalWid = 3.562  # sepal_wid
row.petalLen = 1.049  # petal_len
row.petalWid = 2.388  # petal_wid
scores = client.score_reason_codes(row)
transport.close()

You can reproduce the exact same result from other languages, e.g. Java:

thrift --gen java scoring.thrift

// Dependencies:
// commons-codec-1.9.jar
// commons-logging-1.2.jar
// httpclient-4.4.1.jar
// httpcore-4.4.1.jar
// libthrift-0.10.0.jar
// slf4j-api-1.7.12.jar

import ai.h2o.scoring.Row;
import ai.h2o.scoring.ScoringService;
import org.apache.thrift.TException;
import org.apache.thrift.protocol.TBinaryProtocol;
import org.apache.thrift.transport.TSocket;
import org.apache.thrift.transport.TTransport;
import java.util.List;

public class Main {
  public static void main(String[] args) {
    try {
      TTransport transport = new TSocket("localhost", 9090);
      transport.open();

      ScoringService.Client client = new ScoringService.Client(
        new TBinaryProtocol(transport));

      Row row = new Row(7.642, 3.436, 6.721, 1.020);
      List<Double> scores = client.score_reason_codes(row);
      System.out.println(scores);

      transport.close();
    } catch (TException ex) {
      ex.printStackTrace();
    }
  }
}

Scoring Service - HTTP Mode (JSON-RPC 2.0)

The HTTP mode allows you to use the scoring service using plaintext JSON-RPC calls. This is usually less performant compared to Thrift, but has the advantage of being usable from any HTTP client library in your language of choice, without any dependency on Thrift.

For JSON-RPC documentation, see http://www.jsonrpc.org/specification .

To start the scoring service in HTTP mode:

----- See 'run_http_server.sh' for complete example. -----
python http_server.py --port=9090

To invoke scoring methods, compose a JSON-RPC message and make a HTTP POST request to http://host:port/rpc as follows:

----- See 'run_http_client.sh' for complete example. -----
curl http://localhost:9090/rpc \
  --header "Content-Type: application/json" \
  --data @- <<EOF
 {
  "id": 1,
  "method": "score_reason_codes",
  "params": {
    "row": [ 7.486, 3.277, 4.755, 2.354 ]
  }
 }
EOF

Similarly, you can use any HTTP client library to reproduce the above result. For example, from Python, you can use the requests module as follows:

import requests
row = [7.486, 3.277, 4.755, 2.354]
req = dict(id=1, method='score_reason_codes', params=dict(row=row))
res = requests.post('http://localhost:9090/rpc', data=req)
print(res.json()['result'])

Driverless AI MOJO Scoring Pipeline

For completed experiments, Driverless AI converts models to MOJOs (Model Objects, Optimized). A MOJO is a scoring engine that can be deployed in any Java environment for scoring in real time.

Keep in mind that, similar to H2O-3, MOJOs are tied to experiments. Experiments and MOJOs are not automatically upgraded when Driverless AI is upgraded.

Note: MOJOs are currently not available for TensorFlow and Rulefit models.

Prerequisites

The following are required in order to run the MOJO scoring pipeline.

  • Java 8 runtime
  • Valid Driverless AI license. You can download the license.sig file from the machine hosting Driverless AI (usually in the license folder). Copy the license file into the downloaded mojo-pipeline folder.
  • mojo2-runtime.jar file. This is available from the top navigation menu in the Driverless AI UI and in the downloaded mojo-pipeline.zip file for an experiment.

License Specification

Driverless AI requires a license to be specified in order to run the MOJO Scoring Pipeline. The license can be specified in one of the following ways:

  • Via an environment variable:
    • DRIVERLESS_AI_LICENSE_FILE: Path to the Driverless AI license file, or
    • DRIVERLESS_AI_LICENSE_KEY: The Driverless AI license key (Base64 encoded string)
  • Via a system property of JVM (-D option):
    • ai.h2o.mojos.runtime.license.file: Path to the Driverless AI license file, or
    • ai.h2o.mojos.runtime.license.key: The Driverless AI license key (Base64 encoded string)
  • Via an application classpath:
    • The license is loaded from a resource called /license.sig.
    • The default resource name can be changed via the JVM system property ai.h2o.mojos.runtime.license.filename.

For example:

java -Dai.h2o.mojos.runtime.license.file=/etc/dai/license.sig -cp mojo2-runtime.jar ai.h2o.mojos.ExecuteMojo pipeline.mojo example.csv

Enabling the MOJO Scoring Pipeline

The MOJO Scoring Pipeline is disabled by default. As a result, a MOJO will have to be built for each desired experiment by clicking on the Build MOJO Scoring Pipeline button:

Build MOJO Scoring Pipeline button

To enable MOJO Scoring Pipelines for each experiment, stop Driverless AI, then restart using the DRIVERLESS_AI_MAKE_MOJO_SCORING_PIPELINE=1 flag. (Refer to Using the config.toml File section for more information.) For example:

nvidia-docker run \
 --add-host name.node:172.16.2.186 \
 -e DRIVERLESS_AI_MAKE_MOJO_SCORING_PIPELINE=1 \
 -p 12345:12345 \
 --pid=host \
 --init \
 --rm \
 -v /tmp/dtmp/:/tmp \
 -v /tmp/dlog/:/log \
 -u $(id -u):$(id -g) \
 opsh2oai/h2oai-runtime

Or you can change the value of make_mojo_scoring_pipeline to true in the config.toml file and specify that file when restarting Driverless AI.

MOJO Scoring Pipeline Files

The mojo-pipeline folder includes the following files:

  • run_example.sh: An bash script to score a sample test set.
  • pipeline.mojo: Standalone scoring pipeline in MOJO format.
  • mojo2-runtime.jar: MOJO Java runtime.
  • example.csv: Sample test set (synthetic, of the correct format).

Quickstart

Before running the quickstart examples, be sure that the MOJO scoring pipeline is already downloaded and unzipped:

  1. On the completed Experiment page, click on the Download Scoring Pipeline button to download the scorer.zip file for this experiment onto your local machine.
Download MOJO Scoring Pipeline button

Note: This button is Build MOJO Scoring Pipeline if the MOJO Scoring Pipeline is disabled.

  1. To score all rows in the sample test set (example.csv) with the MOJO pipeline (pipeline.mojo) and license stored in the environment variable DRIVERLESS_AI_LICENSE_KEY:
bash run_example.sh
  1. To score a specific test set (example.csv`)` with MOJO pipeline (``pipeline.mojo) and the license file (license.sig):
bash run_example.sh pipeline.mojo example.csv license.sig
  1. To run the Java application for data transformation directly:
java -Dai.h2o.mojos.runtime.license.file=license.sig -cp mojo2-runtime.jar ai.h2o.mojos.ExecuteMojo pipeline.mojo example.csv

Compile and Run the MOJO from Java

  1. Open a new terminal window and change directories to the experiment folder:
cd experiment
  1. Create your main program in the experiment folder by creating a new file called Main.java (for example, using vim Main.java). Include the following contents.
import java.io.IOException;

import ai.h2o.mojos.runtime.MojoPipeline;
import ai.h2o.mojos.runtime.frame.MojoFrame;
import ai.h2o.mojos.runtime.frame.MojoFrameBuilder;
import ai.h2o.mojos.runtime.frame.MojoRowBuilder;
import ai.h2o.mojos.runtime.utils.SimpleCSV;

public class Main {

  public static void main(String[] args) throws IOException {
    // Load model and csv
    MojoPipeline model = MojoPipeline.loadFrom("pipeline.mojo");

    // Get and fill the input columns
    MojoFrameBuilder frameBuilder = model.getInputFrameBuilder();
    MojoRowBuilder rowBuilder = frameBuilder.getMojoRowBuilder();
    rowBuilder.setValue("AGE", "68");
    rowBuilder.setValue("RACE", "2");
    rowBuilder.setValue("DCAPS", "2");
    rowBuilder.setValue("VOL", "0");
    rowBuilder.setValue("GLEASON", "6");
    frameBuilder.addRow(rowBuilder);

    // Create a frame which can be transformed by MOJO pipeline
    MojoFrame iframe = frameBuilder.toMojoFrame();

    // Transform input frame by MOJO pipeline
    MojoFrame oframe = model.transform(iframe);
    // `MojoFrame.debug()` can be used to view the contents of a Frame
    // oframe.debug();

    // Output prediction as CSV
    SimpleCSV outCsv = SimpleCSV.read(oframe);
    outCsv.write(System.out);
  }
}
  1. Compile the source code:
javac -cp mojo2-runtime.jar -J-Xms2g -J-XX:MaxPermSize=128m Main.java
  1. Run the MOJO example:
# Linux and OS X users
java -Dai.h2o.mojos.runtime.license.file=license.sig -cp .:mojo2-runtime.jar Main
# Windows users
java -Dai.h2o.mojos.runtime.license.file=license.sig -cp .;mojo2-runtime.jar Main
  1. The following output is displayed:
CAPSULE.True
0.5442205910902282

Using the MOJO Scoring Pipeline with Spark/Sparkling Water

MOJO scoring pipeline artifacts can be used in Spark to deploy predictions in parallel using the Sparkling Water API. This section shows how to load and run predictions on the MOJO scoring pipeline in Spark using Scala and the Python API.

In the event that you upgrade H2O Driverless AI, we have a good news! Sparkling Water is backwards compatible with MOJO versions produced by older Driverless AI versions.

Requirements

  • You must have a Spark cluster with the Sparkling Water JAR file passed to Spark.
  • To run with PySparkling, you must have the PySparkling zip file.

The H2OContext does not have to be created if you only want to run predictions on MOJOs using Spark. This is because they are written to be independent of the H2O run-time.

Preparing Your Environment

Both PySparkling and Sparkling Water need to be started with some extra configurations in order to enable the MOJO scoring pipeline. Examples are provided below. Specifically, you must pass the path of the H2O Driverless AI license to the Spark --jars argument. Additionally, you need to add to the same --jars configuration path to the MOJO scoring pipeline implementation JAR file mojo2-runtime.jar. This file is propriatory and is not part of the resulting Sparkling Water assembly JAR file.

Note: In Local Spark mode, please use --driver-class-path to specify path to the license file and the MOJO Pipeline JAR file.

PySparkling

First, start PySpark with all the required dependencies. The following command passes the license file and the MOJO scoring pipeline implementation library to the --jars argument and also specifies the path to the PySparkling Python library.

./bin/pyspark --jars license.sig,mojo2-runtime.jar --py-files pysparkling.zip

or, you can download official Sparkling Water distribution from H2O Download page. Please follow steps on the Sparkling Water download page. Once you are in the Sparkling Water directory, you can call:

./bin/pysparkling --jars license.sig,mojo2-runtime.jar

At this point, you should have available a PySpark interactive terminal where you can try out predictions. If you would like to productionalize the scoring process, you can use the same configuration, except instead of using ./bin/pyspark, you would use ./bin/spark-submit to submit your job to a cluster.

# First, specify the dependency
from pysparkling.ml import H2OMOJOPipelineModel
# Load the pipeline
mojo = H2OMOJOPipelineModel.create_from_mojo("file:///path/to/the/pipeline.mojo")

# This option ensures that the output columns are named properly. If you want to use old behavior
# when all output columns were stored inside an array, don't specify this configuration option,
# or set it to False. We however strongly encourage users to set this to True as below.
mojo.set_named_mojo_output_columns(True)
# Load the data as Spark's Data Frame
data_frame = spark.read.csv("file:///path/to/the/data.csv", header=True)
# Run the predictions. The predictions contain all the original columns plus the predictions
# added as new columns
predictions = mojo.predict(data_frame)

# You can easily get the predictions for a desired column using the helper function as
predictions.select(mojo.select_prediction_udf("AGE")).collect()

Sparkling Water

First start Spark with all the required dependencies. The following command passes the license file and the MOJO scoring pipeline implementation library mojo2-runtime.jar to the --jars argument and also specifies the path to the Sparkling Water assembly jar.

./bin/spark-shell --jars license.sig,mojo2-runtime.jar,sparkling-water-assembly.jar

At this point, you should have available a Sparkling Water interactive terminal where you can try out predictions. If you would like to productionalize the scoring process, you can use the same configuration, except instead of using ./bin/spark-shell, you would use ./bin/spark-submit to submit your job to a cluster.

// First, specify the dependency
import org.apache.spark.ml.h2o.models.H2OMOJOPipelineModel
// Load the pipeline
val mojo = H2OMOJOPipelineModel.createFromMojo("file:///path/to/the/pipeline.mojo")

// This option ensures that the output columns are named properly. If you want to use old behaviour
// when all output columns were stored inside and array, don't specify this configuration option
// or set it to False. We however strongly encourage users to set this to true as below.
mojo.setNamedMojoOutputColumns(true)
// Load the data as Spark's Data Frame
val dataFrame = spark.read.option("header", "true").csv("file:///path/to/the/data.csv")
// Run the predictions. The predictions contain all the original columns plus the predictions
// added as new columns
val predictions = mojo.transform(dataFrame)

// You can easily get the predictions for desired column using the helper function as follows:
predictions.select(mojo.selectPredictionUDF("AGE"))