Version: v1.0.16

Monitoring setup

This guide shows you how to configure monitoring for your deployment using the H2O MLOps Python client.

Follow the steps below to define monitored columns, optionally set up Kafka integration, deploy with monitoring enabled, or enable or disable monitoring after deployment.

Step 1: Define input and output columns

To enable monitoring in H2O MLOps, you must specify the input and output columns to monitor. You can do this in one of the following ways:

Manual configuration
Automatic configuration

Manual configuration

You can manually define the monitored columns using the MonitoringOptions class:

from h2o_mlops.options import (
   BaselineData,
   Column,
   MissingValues,
   MonitoringOptions,
   NumericalAggregate,
)
from h2o_mlops.types import ColumnLogicalType

options = MonitoringOptions(
   enabled=True,
   input_columns=[
     Column(
         name="age",
         logical_type=ColumnLogicalType.NUMERICAL,
     ),
   ],
   output_columns=[
     Column(
         name="quantity",
         logical_type=ColumnLogicalType.NUMERICAL,
         is_model_output=True,
     )
   ],
   baseline_data=[
       BaselineData(
           column_name="AGE",
           logical_type=ColumnLogicalType.NUMERICAL,
           numerical_aggregate=NumericalAggregate(
               bin_edges=[
                   float("-inf"),
                   22.0,
                   23.0,
                   25.0,
                   26.0,
                   28.0,
                   30.0,
                   31.0,
                   float("inf"),
               ],
               bin_count=[0, 1, 3, 1, 2, 2, 3, 3],
               mean_value=27.266666666666666,
               standard_deviation=3.2396354880199243,
               min_value=22.0,
               max_value=31.0,
               sum_value=409.0,
           ),
           categorical_aggregate=None,
           missing_values=MissingValues(row_count=0),
       ),
   ],
)

Automatic configuration

You can also configure monitoring automatically. This method can calculate the baseline using PySpark.

note

PySpark is required for this step.

from h2o_mlops.types import ColumnLogicalType
from h2o_mlops.utils.monitoring import (
   Format,
   get_spark_session,
   prepare_monitoring_options_from_data_frame,
   read_source,
)

session = get_spark_session()

baseline_data_frame = read_source(
   spark=session,
   source_data="file:///datasets/categorical_data.csv",
   source_format=Format.CSV,
)

### User is able to override logical type for column for example ID column
logical_type_overrides = {
   "id": ColumnLogicalType.ID,
}

# Experiment is optional and base on schema in experiment code is able to discover proper types for monitoring
options = prepare_monitoring_options_from_data_frame(
   data_frame=baseline_data_frame,
   logical_type_overrides=logical_type_overrides,
   experiment=experiment,
)

options.enabled = True

Step 2: Optional: Kafka integration for raw scoring logs

You can enable the export of raw scoring request and response data to Kafka, if it is enabled in the environment. You can use a global topic or specify a custom topic per deployment.

options.kafka_topic = "test"

Step 3: Edit baseline and columns before deployment

You can modify the automatically detected baseline and monitored columns before deployment if the detection was inaccurate.

To modify the logical type of an existing column:

options.input_columns[0].logical_type = ColumnLogicalType.CATEGORICAL

To replace an entire column definition:

options.input_columns[0] = Column(
   name="width",
   logical_type=ColumnLogicalType.NUMERICAL,
)

Step 4: Configure monitoring for deployment

You can deploy a model with monitoring enabled, or enable or disable monitoring after deployment.

Deploy with monitoring enabled

To deploy with monitoring enabled:

deployment = workspace.deployments.create(
   name="demo-deployment",
   composition_options=[comp_opts],
   mode=DeploymentModeType.SINGLE_MODEL,
   monitoring_options=options,
   security_options=sec_opt,
)

Enable or disable monitoring after deployment

You can enable or disable monitoring after deployment as long as the monitored columns were provided. If they weren’t, you must configure them first with the monitoring_options configuration.

To disable monitoring if it was already configured:

options = deployment.monitoring_options
options.enabled = False
(monitoring_options=options)

To enable monitoring when it wasn’t configured at deployment time:

First, define the monitored columns using manual or automatic configuration.
For more information, see Step 1: Define input and output columns. Then:

options = deployment.monitoring_options
options.enabled = True
deployment.update(monitoring_options=options)

Feedback

Submit and view feedback for this page
Send feedback about H2O MLOps to cloud-feedback@h2o.ai

Step 1: Define input and output columns​

Manual configuration​

Automatic configuration​

Step 2: Optional: Kafka integration for raw scoring logs​

Step 3: Edit baseline and columns before deployment​

Step 4: Configure monitoring for deployment​

Deploy with monitoring enabled​

Enable or disable monitoring after deployment​