Version: v1.1.x

Tutorial 1B: Batch scoring with the Python client

This tutorial showcases the use of H2O eScorer Python client to run the Batch Scorer. Batch scoring is built to read, score and write large amounts of datasets from storage. An example of a storage is AWS S3. In this tutorial, we will use a properties file with the eScorer Python client to score a .csv dataset from an AWS S3 bucket, and write results back to the same bucket.

Install the Python client

You can download the H2O eScorer Python client wheel from the Python client tab in the H2O eScorer downloads page.
In your Python environment, run the following command to install the package and its dependencies:
```
pip install <python-client-wheel-name>
```

Authentication

H2O eScorer environment variables for authentication are set just once, and can be automatically used by the client for as many runs as you want.

Note

For more information about authenticating the Python client, see Python client overview: Authentication

Score

In H2O eScorer, batch scoring can be performed easily with the properties file.

Note

For information on how to autogenerate and populate a properties file to configure batch scoring, see Batch scoring configuration and usage.

import h2o_escorer 

client = h2o_escorer.Client(
    escorer_url='https://rest...',
    token_provider=token_provider,
)

batch_job = await client.scorer.batch.create(
    model_name='riskmodel.mojo',
    properties_filepath='s3_scorer.properties',
)

This method takes the model name and the properties file path as arguments. The properties file contains the configuration for the batch scoring job. The batch_job object is returned, which can be used to monitor the progress of the batch scoring job.

Each job creates a unique ID which can be accessed by calling the id attribute of the batch_job object.

batch_job.id

To check whether a particular job is complete, call the is_complete() method of the batch_job object. This method returns a boolean value.

Note

The is_complete() method will become a property in the next release.

await batch_job.is_complete()

Once complete, you have made predictions using the S3 dataset which is specified in the properties file using the model riskmodel.mojo. Logs can be accessed by calling the get_logs() method of the batch_job object. These logs describe the completion of scoring, number of threads, and the number of errors.

await batch_job.get_logs()

To get list of all batch jobs,

await client.scorer.batch.list()

ModelStats

H2O eScorer Wave app provides a live dashboard with real-time updates to view ModelStats as models are scored in batch.

Feedback

Submit and view feedback for this page
Send feedback about H2O eScorer to cloud-feedback@h2o.ai

Install the Python client​

Authentication​

Score​

ModelStats​

Install the Python client

Authentication

Score

ModelStats