Skip to main content
Version: Next 🚧

Monitoring setup

This guide shows you how to configure monitoring for your deployment using the H2O MLOps Python client.

Follow the steps below to define monitored columns, optionally set up Kafka integration, deploy with monitoring enabled, or enable or disable monitoring after deployment.

Step 1: Define input and output columns​

To enable monitoring in H2O MLOps, you must specify the input and output columns to monitor. You can do this in one of the following ways:

Manual configuration​

You can manually define the monitored columns using the MonitoringOptions class:

from h2o_mlops.options import MonitoringOptions, Column, BaselineData, NumericalAggregate, \
MissingValues
from h2o_mlops.types import ColumnLogicalType

options = MonitoringOptions(
enabled=True,
input_columns = [
Column(
name="age",
logical_type=ColumnLogicalType.NUMERICAL
),
],
output_columns = [
Column(
name="quantity",
logical_type=ColumnLogicalType.NUMERICAL,
is_model_output=True
)
],
baseline_data = [
BaselineData(
column_name="AGE",
logical_type=ColumnLogicalType.NUMERICAL,
numerical_aggregate=NumericalAggregate(
bin_edges=[
float("-inf"),
22.0,
23.0,
25.0,
26.0,
28.0,
30.0,
31.0,
float("inf"),
],
bin_count=[0, 1, 3, 1, 2, 2, 3, 3],
mean_value=27.266666666666666,
standard_deviation=3.2396354880199243,
min_value=22.0,
max_value=31.0,
sum_value=409.0,
),
categorical_aggregate=None,
missing_values=MissingValues(row_count=0),
),
]
)

Automatic configuration​

You can also configure monitoring automatically. This method can calculate the baseline using PySpark.

note

PySpark is required for this step.

from h2o_mlops.utils.monitoring import prepare_monitoring_options_from_data_frame
from h2o_mlops.types import ColumnLogicalType

session = h2o_mlops.utils.monitoring.get_spark_session()

baseline_data_frame = h2o_mlops.utils.monitoring.read_source(
spark=session,
source_data="file:///datasets/categorical_data.csv",
source_format=h2o_mlops.utils.monitoring.Format.CSV,
)

### User is able to override logical type for column for example ID column
logical_type_overrides = {
"id": ColumnLogicalType.ID
}

# Experiment is optional and base on schema in experiment code is able to discover proper types for monitoring
options = prepare_monitoring_options_from_data_frame(
data_frame=baseline_data_frame,
logical_type_overrides = logical_type_overrides,
experiment=experiment,
)

options.enabled = True

Step 2: Optional: Kafka integration for raw scoring logs​

You can enable the export of raw scoring request and response data to Kafka, if it is enabled in the environment. You can use a global topic or specify a custom topic per deployment.

options.kafka_topic = "test"

Step 3: Edit baseline and columns before deployment​

You can modify the automatically detected baseline and monitored columns before deployment if the detection was inaccurate.

To modify the logical type of an existing column:

options.input_columns[0].logical_type = ColumnLogicalType.CATEGORICAL

To replace an entire column definition:

options.input_columns[0] = Column(
name="width",
logical_type=ColumnLogicalType.NUMERICAL
)

Step 4: Configure monitoring for deployment​

You can deploy a model with monitoring enabled, or enable or disable monitoring after deployment.

Deploy with monitoring enabled​

To deploy with monitoring enabled:

deployment = project.deployments.create(
name = "demo-deployment",
composition_options=[comp_opts],
mode=DeploymentModeType.SINGLE_MODEL,
monitoring_options=options,
security_options=sec_opt
)

Enable or disable monitoring after deployment​

You can enable or disable monitoring after deployment as long as the monitored columns were provided. If they weren’t, you must configure them first with the monitoring_options configuration.

To disable monitoring if it was already configured:

options = deployment.monitoring_options
options.enabled = False
(monitoring_options=options)

To enable monitoring when it wasn’t configured at deployment time:

First, define the monitored columns using manual or automatic configuration.
For more information, see Step 1: Define input and output columns. Then:

options = deployment.monitoring_options
options.enabled = True
deployment.update(monitoring_options=options)

Feedback