Overview
The H2O MLOps Scoring Client is a Python client library that simplifies mini-batch scoring against an H2O MLOps scoring endpoint. This library lets you run batch scoring jobs on your local machine, a standalone server, Databricks, or a Spark 3 cluster.
Scoring Pandas data frames is as easy as:
pip install h2o-mlops-scoring-client
import h2o_mlops_scoring_client
scores_df = h2o_mlops_scoring_client.score_data_frame(
mlops_endpoint_url="https://.../model/score",
id_column="ID",
data_frame=df,
)
Scoring from a source to a sink is also possible through pyspark
:
pip install h2o-mlops-scoring-client[PYSPARK]
import h2o_mlops_scoring_client
h2o_mlops_scoring_client.score_source_sink(
mlops_endpoint_url="https://.../model/score",
id_column="ID",
source_data="s3a://...",
source_format=h2o_mlops_scoring_client.Format.CSV,
sink_location="s3a://...",
sink_format=h2o_mlops_scoring_client.Format.PARQUET,
sink_write_mode=h2o_mlops_scoring_client.WriteMode.OVERWRITE
)
Install
This section describes how to install the H2O MLOps Scoring Client.
Requirements
- Linux or macOS (Windows is not supported)
- Java (only required for
pyspark
installs) - Python 3.8 and later
Install from PyPI
pip install h2o-mlops-scoring-client
Note: pyspark
is no longer included in a default install. To include pyspark
:
pip install h2o-mlops-scoring-client[PYSPARK]
Frequently asked questions
When should I use the H2O MLOps Scoring Client?
Using the H2O MLOps Scoring Client is recommended when you need to perform batch scoring outside of the H2O AI Cloud platform, but still want to keep the scoring process integrated with the H2O MLOps workflow. This client lets you maintain seamless connections between H2O MLOps projects, scoring, registry, and monitoring features, while processing tasks such as authenticating and connecting to a source or sink and file/data processing or conversions.
Where does scoring take place?
As the batch scoring processing occurs, the data is sent to an H2O MLOps deployment for scoring. The scores are then returned for the batch scoring processing to complete.
What Source/Sinks are supported?
The MLOps scoring client can support many source/sinks, including:
- ADLS Gen 2
- Databases with a JDBC driver
- Local file system
- GBQ
- S3
- Snowflake
What file types are supported?
The H2O MLOps Scoring Client can read and write the following file types:
- CSV
- Parquet
- ORC
- BigQuery tables
- JDBC queries
- JDBC tables
- Snowflake queries
- Snowflake tables
If there's a file type you'd like to see supported, contact support@h2o.ai.
I want model monitoring for batch scoring, can I do that?
Yes. The MLOps Scoring Client uses H2O MLOps scoring endpoints, which are automatically monitored.
Is a Spark installation required?
No. If you're scoring Pandas data frames, then no extra Spark install or configuration is needed. If you want to connect to an external source or sink, you'll need to install pyspark
and do a small amount of configuration.
- Submit and view feedback for this page
- Send feedback about H2O MLOps to cloud-feedback@h2o.ai