Skip to main content
Version: v0.70.0

Batch scoring

Batch scoring is the process of making predictions on a large set of data all at once, instead of one-by-one in real time. This page guides you on how to use the H2O MLOps Python client for batch scoring.

Define input source and output location

To perform batch scoring, specify both a data source and an output location. Use temporary AWS credentials for accessing S3.

Use the following code to configure the input source:

source = h2o_mlops.options.BatchSourceOptions(
spec_uid="s3",
config={
"region": "us-west-2",
"accessKeyID": credentials['AccessKeyId'],
"secretAccessKey": credentials['SecretAccessKey'],
"sessionToken": credentials['SessionToken'],
},
mime_type=h2o_mlops.options.MimeTypeOptions.CSV,
location="<location>",
)

Set up the output location where the batch scoring results will be stored:

output_location = location="<location>" + datetime.now().strftime("%Y%m%d-%H%M%S")
sink = h2o_mlops.options.BatchSinkOptions(
spec_uid="s3",
config={
"region": "us-west-2",
"accessKeyID": credentials['AccessKeyId'],
"secretAccessKey": credentials['SecretAccessKey'],
"sessionToken": credentials['SessionToken'],
},
mime_type=h2o_mlops.options.MimeTypeOptions.JSONL,
location=output_location,
)

Create batch scoring job

First, retrieve the scoring runtime for the model:

scoring_runtime = client.runtimes.scoring.list(
artifact_type=model.get_experiment().scoring_artifact_types[0]
)[0]

To retrieve a list of available resource specifications for job creation, use:

client.batch_connectors.sink_specs.list()

Create the batch scoring job:

job = project.batch_scoring_jobs.create(
source=source,
sink=sink,
model=model,
scoring_runtime=scoring_runtime,
name="DEMO JOB",
mini_batch_size=100, #number of rows sent per request during batch processing
resource_spec=h2o_mlops.options.BatchKubernetesOptions(
replicas=2,
min_replicas=1,

),
)
job

Retrieve the job ID:

job.uid

Wait for job completion

During the execution of the following code, you can view the log output from both the scorer and the batch scoring job.

job.wait()

By default, this command will print logs while waiting. If you want to wait for job completion without printing any logs, use:

job.wait(logs=False)

List all jobs

project.batch_scoring_jobs.list()

Retrieve a job by ID

project.batch_scoring_jobs.get('<job ID>')

Cancel a job

job.cancel()

By default, this command blocks until the job is fully canceled. If you want to cancel without waiting for completion, use:

job.cancel(wait=False)

Delete a job

job.delete()

Feedback