The Scoring Pipeline

As indicated earlier, a scoring pipeline is available after a successfully completed experiment. This pipeline includes a scoring module and a scoring service.

The scoring module is a Python module bundled into a standalone wheel file (name scoring_*.whl). All the prerequisites for the scoring module to work correctly are listed in the requirements.txt file.

The scoring service hosts the scoring module as an HTTP or TCP service. Doing this exposes all the functions of the scoring module through remote procedure calls (RPC). In effect, this mechanism allows you to invoke scoring functions from languages other than Python on the same computer or from another computer on a shared network or on the Internet.

The scoring service can be started in two modes:

  • In TCP mode, the scoring service provides high-performance RPC calls via Apache Thrift (https://thrift.apache.org/) using a binary wire protocol.
  • In HTTP mode, the scoring service provides JSON-RPC 2.0 calls served by Tornado (http://www.tornadoweb.org).

Scoring operations can be performed on individual rows (row-by-row) or in batch mode (multiple rows at a time).

Prerequisites

The following are required in order to run the scoring pipeline.

  • Linux x86_64 environment
  • Python 3.6
  • Virtual Environment
  • Apache Thrift (to run the TCP scoring service)

The scoring pipeline has been tested on Ubuntu 16.04 and on 16.10+. Examples of how to install these prerequisites are below:

Installing requirements on Ubuntu 16.10+

$ sudo apt install python3.6 python3.6-dev python3-pip python3-dev \
  python-virtualenv python3-virtualenv

Installing requirements on Ubuntu 16.04

$ sudo add-apt-repository ppa:deadsnakes/ppa
$ sudo apt-get update
$ sudo apt-get install python3.6 python3.6-dev python3-pip python3-dev \
  python-virtualenv python3-virtualenv

Installing Thrift

Thrift is required to run the scoring service in TCP mode, but it is not required to run the scoring module. The following steps are available on the Thrift documentation site at: https://thrift.apache.org/docs/BuildingFromSource.

$ sudo apt-get install automake bison flex g++ git libevent-dev \
  libssl-dev libtool make pkg-config libboost-all-dev ant
$ wget https://github.com/apache/thrift/archive/0.10.0.tar.gz
$ tar -xvf 0.10.0.tar.gz
$ cd thrift-0.10.0
$ ./bootstrap.sh
$ ./configure
$ make
$ sudo make install

Scoring Pipeline Files

The scoring-pipeline folder includes the following notable files:

  • example.py: An example Python script demonstrating how to import and score new records.
  • run_example.sh: Runs example.py (also sets up a virtualenv with prerequisite libraries).
  • server.py: A standalone TCP/HTTP server for hosting scoring services.
  • run_tcp_server.sh: Runs TCP scoring service (runs server.py).
  • run_http_server.sh: Runs HTTP scoring service (runs server.py).
  • example_client.py: An example Python script demonstrating how to communicate with the scoring server.
  • run_tcp_client.sh: Demonstrates how to communicate with the scoring service via TCP (runs example_client.py).
  • run_http_client.sh: Demonstrates how to communicate with the scoring service via HTTP (using curl).

Examples

This section provides examples showing how to run the scoring module and how to run the scoring service in TCP and HTTP mode.

Before running these examples, be sure that the scoring pipeline is already downloaded and unzipped:

  1. On the completed Experiment page, click on the Download Scoring Pipeline button to download the scorer.zip file for this experiment onto your local machine.
Download Scoring Pipeline button
  1. Unzip the scoring pipeline.

After the pipeline is downloaded and unzipped, you will be able to run the scoring module and the scoring service.

Running the Scoring Module

Navigate to the scoring-pipeline folder and run the following:

bash run_example.sh

The script creates a virtual environment within the scoring-pipeline folder, installs prerequisites, and finally runs example.py, which uses the completed experiment.

StackedBaseModels transform
StackedBaseModels transform done
0.11548620959122975
StackedBaseModels transform
StackedBaseModels transform done
0.08865078207519318
StackedBaseModels transform
StackedBaseModels transform done
0.32344844937324524
StackedBaseModels transform
StackedBaseModels transform done
0.3894717726442549
StackedBaseModels transform
StackedBaseModels transform done
0.2775277644395828
StackedBaseModels transform
StackedBaseModels transform done
[ 0.11548621  0.08865078  0.32482851  0.41540962  0.29031289]

Running the Scoring Service - TCP Mode

TCP mode allows you to use the scoring service from any language supported by Thrift, including C, C++, C#, Cocoa, D, Dart, Delphi, Go, Haxe, Java, Node.js, Lua, perl, PHP, Python, Ruby, and Smalltalk.

To start the scoring service in TCP mode, generate the Thrift bindings once, and then run the server. Note that the Thrift compiler is only required at build-time. It is not a run time dependency, i.e. once the scoring services are built and tested, you do not need to repeat this installation process on the machines where the scoring services are intended to be deployed.

# See the run_tcp_server.sh file for a complete example.
$ thrift --gen py scoring.thrift
$ python server.py --mode=tcp --port=9090

Call the scoring service by generating the Thrift bindings for your language of choice, then make RPC calls via TCP sockets using Thrift’s buffered transport in conjunction with its binary protocol.

# See the run_tcp_client.sh and example_client.py files for a complete example.
$ thrift --gen py scoring.thrift
  socket = TSocket.TSocket('localhost', 9090)
  transport = TTransport.TBufferedTransport(socket)
  protocol = TBinaryProtocol.TBinaryProtocol(transport)
  client = ScoringService.Client(protocol)
  transport.open()
  row = Row()
  row.sepalLen = 7.416  # sepal_len
  row.sepalWid = 3.562  # sepal_wid
  row.petalLen = 1.049  # petal_len
  row.petalWid = 2.388  # petal_wid
  scores = client.score(row)
  transport.close()

Note that you can reproduce the exact same results from other languages. For example, to run the scoring service in Java, use:

$ thrift --gen java scoring.thrift

Running the Scoring Service - HTTP Mode

The HTTP mode allows you to use the scoring service using plaintext JSON-RPC calls. This is usually less performant compared to Thrift, but has the advantage of being usable from any HTTP client library in your language of choice, without any dependency on Thrift.

For JSON-RPC documentation, see http://www.jsonrpc.org/specification.

To start the scoring service in HTTP mode:

# See run_http_server.sh for a complete example
$ python server.py --mode=http --port=9090

To invoke scoring methods, compose a JSON-RPC message and make a HTTP POST request to http://host:port/rpc as follows:

# See run_http_client.sh for a complete example
$ curl http://localhost:9090/rpc \
  --header "Content-Type: application/json" \
  --data @- <<EOF
  {
    "id": 1,
    "method": "score",
    "params": {
      "row": [ 7.486, 3.277, 4.755, 2.354 ]
    }
  }
  EOF

Similarly, you can use any HTTP client library to reproduce the above result. For example, from Python, you can use the requests module as follows:

import requests
row = [7.486, 3.277, 4.755, 2.354]
req = dict(id=1, method='score', params=dict(row=row))
res = requests.post('http://localhost:9090/rpc', data=req)
print(res.json()['result'])