Driverless AI MOJO Scoring Pipeline

For completed experiments, Driverless AI converts models to MOJOs (Model Objects, Optimized). A MOJO is a scoring engine that can be deployed in any Java environment for scoring in real time.

Keep in mind that, similar to H2O-3, MOJOs are tied to experiments. Experiments and MOJOs are not automatically upgraded when Driverless AI is upgraded.

Note: MOJOs are currently not available for TensorFlow, RuleFit, or FTRL models.

Prerequisites

The following are required in order to run the MOJO scoring pipeline.

  • Java 7 runtime (JDK 1.7) or newer.
  • Valid Driverless AI license. You can download the license.sig file from the machine hosting Driverless AI (usually in the license folder). Copy the license file into the downloaded mojo-pipeline folder.
  • mojo2-runtime.jar file. This is available from the top navigation menu in the Driverless AI UI and in the downloaded mojo-pipeline.zip file for an experiment.

License Specification

Driverless AI requires a license to be specified in order to run the MOJO Scoring Pipeline. The license can be specified in one of the following ways:

  • Via an environment variable:
    • DRIVERLESS_AI_LICENSE_FILE: Path to the Driverless AI license file, or
    • DRIVERLESS_AI_LICENSE_KEY: The Driverless AI license key (Base64 encoded string)
  • Via a system property of JVM (-D option):
    • ai.h2o.mojos.runtime.license.file: Path to the Driverless AI license file, or
    • ai.h2o.mojos.runtime.license.key: The Driverless AI license key (Base64 encoded string)
  • Via an application classpath:
    • The license is loaded from a resource called /license.sig.
    • The default resource name can be changed via the JVM system property ai.h2o.mojos.runtime.license.filename.

For example:

java -Dai.h2o.mojos.runtime.license.file=/etc/dai/license.sig -cp mojo2-runtime.jar ai.h2o.mojos.ExecuteMojo pipeline.mojo example.csv

Enabling the MOJO Scoring Pipeline

The MOJO Scoring Pipeline is disabled by default. As a result, a MOJO will have to be built for each desired experiment by clicking on the Build MOJO Scoring Pipeline button:

Build MOJO Scoring Pipeline button

To enable MOJO Scoring Pipelines for each experiment, stop Driverless AI, then restart using the DRIVERLESS_AI_MAKE_MOJO_SCORING_PIPELINE=1 flag. (Refer to Using the config.toml File section for more information.) For example:

nvidia-docker run \
 --add-host name.node:172.16.2.186 \
 -e DRIVERLESS_AI_MAKE_MOJO_SCORING_PIPELINE=1 \
 -p 12345:12345 \
 --pid=host \
 --init \
 --rm \
 -v /tmp/dtmp/:/tmp \
 -v /tmp/dlog/:/log \
 -u $(id -u):$(id -g) \
 opsh2oai/h2oai-runtime

Or you can change the value of make_mojo_scoring_pipeline to true in the config.toml file and specify that file when restarting Driverless AI.

MOJO Scoring Pipeline Files

The mojo-pipeline folder includes the following files:

  • run_example.sh: An bash script to score a sample test set.
  • pipeline.mojo: Standalone scoring pipeline in MOJO format.
  • mojo2-runtime.jar: MOJO Java runtime.
  • example.csv: Sample test set (synthetic, of the correct format).

Quickstart

Before running the quickstart examples, be sure that the MOJO scoring pipeline is already downloaded and unzipped:

  1. On the completed Experiment page, click on the Download Scoring Pipeline button to download the scorer.zip file for this experiment onto your local machine.
Download MOJO Scoring Pipeline button

Note: This button is Build MOJO Scoring Pipeline if the MOJO Scoring Pipeline is disabled.

  1. To score all rows in the sample test set (example.csv) with the MOJO pipeline (pipeline.mojo) and license stored in the environment variable DRIVERLESS_AI_LICENSE_KEY:
bash run_example.sh
  1. To score a specific test set (example.csv) with MOJO pipeline (pipeline.mojo) and the license file (license.sig):
bash run_example.sh pipeline.mojo example.csv license.sig
  1. To run the Java application for data transformation directly:
java -Dai.h2o.mojos.runtime.license.file=license.sig -cp mojo2-runtime.jar ai.h2o.mojos.ExecuteMojo pipeline.mojo example.csv

Note: For very large models, it may be necessary to increase the memory limit when running the Java application for data transformation. This can be done by specifying -Xmx25g when running the above command.

Compile and Run the MOJO from Java

  1. Open a new terminal window and change directories to the experiment folder:
cd experiment
  1. Create your main program in the experiment folder by creating a new file called Main.java (for example, using vim Main.java). Include the following contents.
import java.io.IOException;

import ai.h2o.mojos.runtime.MojoPipeline;
import ai.h2o.mojos.runtime.frame.MojoFrame;
import ai.h2o.mojos.runtime.frame.MojoFrameBuilder;
import ai.h2o.mojos.runtime.frame.MojoRowBuilder;
import ai.h2o.mojos.runtime.utils.SimpleCSV;
import ai.h2o.mojos.runtime.lic.LicenseException;

public class Main {

  public static void main(String[] args) throws IOException, LicenseException {
    // Load model and csv
    MojoPipeline model = MojoPipeline.loadFrom("pipeline.mojo");

    // Get and fill the input columns
    MojoFrameBuilder frameBuilder = model.getInputFrameBuilder();
    MojoRowBuilder rowBuilder = frameBuilder.getMojoRowBuilder();
    rowBuilder.setValue("AGE", "68");
    rowBuilder.setValue("RACE", "2");
    rowBuilder.setValue("DCAPS", "2");
    rowBuilder.setValue("VOL", "0");
    rowBuilder.setValue("GLEASON", "6");
    frameBuilder.addRow(rowBuilder);

    // Create a frame which can be transformed by MOJO pipeline
    MojoFrame iframe = frameBuilder.toMojoFrame();

    // Transform input frame by MOJO pipeline
    MojoFrame oframe = model.transform(iframe);
    // `MojoFrame.debug()` can be used to view the contents of a Frame
    // oframe.debug();

    // Output prediction as CSV
    SimpleCSV outCsv = SimpleCSV.read(oframe);
    outCsv.write(System.out);
  }
}
  1. Compile the source code:
javac -cp mojo2-runtime.jar -J-Xms2g -J-XX:MaxPermSize=128m Main.java
  1. Run the MOJO example:
# Linux and OS X users
java -Dai.h2o.mojos.runtime.license.file=license.sig -cp .:mojo2-runtime.jar Main
# Windows users
java -Dai.h2o.mojos.runtime.license.file=license.sig -cp .;mojo2-runtime.jar Main
  1. The following output is displayed:
CAPSULE.True
0.5442205910902282

Using the MOJO Scoring Pipeline with Spark/Sparkling Water

Note: The Driverless AI 1.5 release will be the last release with TOML-based MOJO2. Releases after 1.5 will include protobuf-based MOJO2.

MOJO scoring pipeline artifacts can be used in Spark to deploy predictions in parallel using the Sparkling Water API. This section shows how to load and run predictions on the MOJO scoring pipeline in Spark using Scala and the Python API.

In the event that you upgrade H2O Driverless AI, we have a good news! Sparkling Water is backwards compatible with MOJO versions produced by older Driverless AI versions.

Requirements

  • You must have a Spark cluster with the Sparkling Water JAR file passed to Spark.
  • To run with PySparkling, you must have the PySparkling zip file.

The H2OContext does not have to be created if you only want to run predictions on MOJOs using Spark. This is because they are written to be independent of the H2O run-time.

Preparing Your Environment

Both PySparkling and Sparkling Water need to be started with some extra configurations in order to enable the MOJO scoring pipeline. Examples are provided below. Specifically, you must pass the path of the H2O Driverless AI license to the Spark --jars argument. Additionally, you need to add to the same --jars configuration path to the MOJO scoring pipeline implementation JAR file mojo2-runtime.jar. This file is propriatory and is not part of the resulting Sparkling Water assembly JAR file.

Note: In Local Spark mode, please use --driver-class-path to specify path to the license file and the MOJO Pipeline JAR file.

PySparkling

First, start PySpark with all the required dependencies. The following command passes the license file and the MOJO scoring pipeline implementation library to the --jars argument and also specifies the path to the PySparkling Python library.

./bin/pyspark --jars license.sig,mojo2-runtime.jar --py-files pysparkling.zip

or, you can download official Sparkling Water distribution from H2O Download page. Please follow steps on the Sparkling Water download page. Once you are in the Sparkling Water directory, you can call:

./bin/pysparkling --jars license.sig,mojo2-runtime.jar

At this point, you should have available a PySpark interactive terminal where you can try out predictions. If you would like to productionalize the scoring process, you can use the same configuration, except instead of using ./bin/pyspark, you would use ./bin/spark-submit to submit your job to a cluster.

# First, specify the dependency
from pysparkling.ml import H2OMOJOPipelineModel
# Load the pipeline
mojo = H2OMOJOPipelineModel.create_from_mojo("file:///path/to/the/pipeline.mojo")

# This option ensures that the output columns are named properly. If you want to use old behavior
# when all output columns were stored inside an array, don't specify this configuration option,
# or set it to False. We however strongly encourage users to set this to True as below.
mojo.set_named_mojo_output_columns(True)
# Load the data as Spark's Data Frame
data_frame = spark.read.csv("file:///path/to/the/data.csv", header=True)
# Run the predictions. The predictions contain all the original columns plus the predictions
# added as new columns
predictions = mojo.predict(data_frame)

# You can easily get the predictions for a desired column using the helper function as
predictions.select(mojo.select_prediction_udf("AGE")).collect()

Sparkling Water

First start Spark with all the required dependencies. The following command passes the license file and the MOJO scoring pipeline implementation library mojo2-runtime.jar to the --jars argument and also specifies the path to the Sparkling Water assembly jar.

./bin/spark-shell --jars license.sig,mojo2-runtime.jar,sparkling-water-assembly.jar

At this point, you should have available a Sparkling Water interactive terminal where you can try out predictions. If you would like to productionalize the scoring process, you can use the same configuration, except instead of using ./bin/spark-shell, you would use ./bin/spark-submit to submit your job to a cluster.

// First, specify the dependency
import org.apache.spark.ml.h2o.models.H2OMOJOPipelineModel
// Load the pipeline
val mojo = H2OMOJOPipelineModel.createFromMojo("file:///path/to/the/pipeline.mojo")

// This option ensures that the output columns are named properly. If you want to use old behaviour
// when all output columns were stored inside and array, don't specify this configuration option
// or set it to False. We however strongly encourage users to set this to true as below.
mojo.setNamedMojoOutputColumns(true)
// Load the data as Spark's Data Frame
val dataFrame = spark.read.option("header", "true").csv("file:///path/to/the/data.csv")
// Run the predictions. The predictions contain all the original columns plus the predictions
// added as new columns
val predictions = mojo.transform(dataFrame)

// You can easily get the predictions for desired column using the helper function as follows:
predictions.select(mojo.selectPredictionUDF("AGE"))