Skip to main content
Version: v0.62.0

Overview

The H2O MLOps Scoring Client is a Python client library that simplifies mini-batch scoring against an H2O MLOps scoring endpoint. This library lets you run batch scoring jobs on your local machine, a standalone server, Databricks, or a Spark 3 cluster.

Scoring Pandas data frames is as easy as:

pip install h2o-mlops-scoring-client
import h2o_mlops_scoring_client


scores_df = h2o_mlops_scoring_client.score_data_frame(
mlops_endpoint_url="https://.../model/score",
id_column="ID",
data_frame=df,
)

Scoring from a source to a sink is also possible through pyspark:

pip install h2o-mlops-scoring-client[PYSPARK]
import h2o_mlops_scoring_client


h2o_mlops_scoring_client.score_source_sink(
mlops_endpoint_url="https://.../model/score",
id_column="ID",
source_data="s3a://...",
source_format=h2o_mlops_scoring_client.Format.CSV,
sink_location="s3a://...",
sink_format=h2o_mlops_scoring_client.Format.PARQUET,
sink_write_mode=h2o_mlops_scoring_client.WriteMode.OVERWRITE
)

Install

This section describes how to install the H2O MLOps Scoring Client.

Requirements

  • Linux or macOS (Windows is not supported)
  • Java (only required for pyspark installs)
  • Python 3.8 and later

Install from PyPI

pip install h2o-mlops-scoring-client

Note: pyspark is no longer included in a default install. To include pyspark:

pip install h2o-mlops-scoring-client[PYSPARK]

Frequently asked questions

When should I use the H2O MLOps Scoring Client?

Using the H2O MLOps Scoring Client is recommended when you need to perform batch scoring outside of the H2O AI Cloud platform, but still want to keep the scoring process integrated with the H2O MLOps workflow. This client lets you maintain seamless connections between H2O MLOps projects, scoring, registry, and monitoring features, while processing tasks such as authenticating and connecting to a source or sink and file/data processing or conversions.

Where does scoring take place?

As the batch scoring processing occurs, the data is sent to an H2O MLOps deployment for scoring. The scores are then returned for the batch scoring processing to complete.

What Source/Sinks are supported?

The MLOps scoring client can support many source/sinks, including:

  • ADLS Gen 2
  • Databases with a JDBC driver
  • Local file system
  • GBQ
  • S3
  • Snowflake

What file types are supported?

The H2O MLOps Scoring Client can read and write the following file types:

  • CSV
  • Parquet
  • ORC
  • BigQuery tables
  • JDBC queries
  • JDBC tables
  • Snowflake queries
  • Snowflake tables

If there's a file type you'd like to see supported, contact support@h2o.ai.

I want model monitoring for batch scoring, can I do that?

Yes. The MLOps Scoring Client uses H2O MLOps scoring endpoints, which are automatically monitored.

Is a Spark installation required?

No. If you're scoring Pandas data frames, then no extra Spark install or configuration is needed. If you want to connect to an external source or sink, you'll need to install pyspark and do a small amount of configuration.


Feedback