Monitoring setup
This guide shows you how to configure monitoring for your deployment using the H2O MLOps Python client.
Follow the steps below to define monitored columns, optionally set up Kafka integration, deploy with monitoring enabled, or enable or disable monitoring after deployment.
Step 1: Define input and output columns
To enable monitoring in H2O MLOps, you must specify the input and output columns to monitor. You can do this in one of the following ways:
Manual configuration
You can manually define the monitored columns using the MonitoringOptions
class:
from h2o_mlops.options import (
BaselineData,
Column,
MissingValues,
MonitoringOptions,
NumericalAggregate,
)
from h2o_mlops.types import ColumnLogicalType
options = MonitoringOptions(
enabled=True,
input_columns=[
Column(
name="age",
logical_type=ColumnLogicalType.NUMERICAL,
),
],
output_columns=[
Column(
name="quantity",
logical_type=ColumnLogicalType.NUMERICAL,
is_model_output=True,
)
],
baseline_data=[
BaselineData(
column_name="AGE",
logical_type=ColumnLogicalType.NUMERICAL,
numerical_aggregate=NumericalAggregate(
bin_edges=[
float("-inf"),
22.0,
23.0,
25.0,
26.0,
28.0,
30.0,
31.0,
float("inf"),
],
bin_count=[0, 1, 3, 1, 2, 2, 3, 3],
mean_value=27.266666666666666,
standard_deviation=3.2396354880199243,
min_value=22.0,
max_value=31.0,
sum_value=409.0,
),
categorical_aggregate=None,
missing_values=MissingValues(row_count=0),
),
],
)
Automatic configuration
You can also configure monitoring automatically. This method can calculate the baseline using PySpark.
PySpark is required for this step.
from h2o_mlops.types import ColumnLogicalType
from h2o_mlops.utils.monitoring import (
Format,
get_spark_session,
prepare_monitoring_options_from_data_frame,
read_source,
)
session = get_spark_session()
baseline_data_frame = read_source(
spark=session,
source_data="file:///datasets/categorical_data.csv",
source_format=Format.CSV,
)
### User is able to override logical type for column for example ID column
logical_type_overrides = {
"id": ColumnLogicalType.ID,
}
# Experiment is optional and base on schema in experiment code is able to discover proper types for monitoring
options = prepare_monitoring_options_from_data_frame(
data_frame=baseline_data_frame,
logical_type_overrides=logical_type_overrides,
experiment=experiment,
)
options.enabled = True
Step 2: Optional: Kafka integration for raw scoring logs
You can enable the export of raw scoring request and response data to Kafka, if it is enabled in the environment. You can use a global topic or specify a custom topic per deployment.
options.kafka_topic = "test"
Step 3: Edit baseline and columns before deployment
You can modify the automatically detected baseline and monitored columns before deployment if the detection was inaccurate.
To modify the logical type of an existing column:
options.input_columns[0].logical_type = ColumnLogicalType.CATEGORICAL
To replace an entire column definition:
options.input_columns[0] = Column(
name="width",
logical_type=ColumnLogicalType.NUMERICAL,
)
Step 4: Configure monitoring for deployment
You can deploy a model with monitoring enabled, or enable or disable monitoring after deployment.
Deploy with monitoring enabled
To deploy with monitoring enabled:
deployment = workspace.deployments.create(
name="demo-deployment",
composition_options=[comp_opts],
mode=DeploymentModeType.SINGLE_MODEL,
monitoring_options=options,
security_options=sec_opt,
)
Enable or disable monitoring after deployment
You can enable or disable monitoring after deployment as long as the monitored columns were provided. If they weren’t, you must configure them first with the monitoring_options
configuration.
To disable monitoring if it was already configured:
options = deployment.monitoring_options
options.enabled = False
(monitoring_options=options)
To enable monitoring when it wasn’t configured at deployment time:
First, define the monitored columns using manual or automatic configuration.
For more information, see Step 1: Define input and output columns. Then:
options = deployment.monitoring_options
options.enabled = True
deployment.update(monitoring_options=options)
- Submit and view feedback for this page
- Send feedback about H2O MLOps to cloud-feedback@h2o.ai