Monitoring setup
This guide shows you how to configure monitoring for your deployment using the H2O MLOps Python client.
Follow the steps below to define monitored columns, optionally set up Kafka integration, deploy with monitoring enabled, or enable or disable monitoring after deployment.
Step 1: Define input and output columnsβ
To enable monitoring in H2O MLOps, you must specify the input and output columns to monitor. You can do this in one of the following ways:
Manual configurationβ
You can manually define the monitored columns using the MonitoringOptions
class:
from h2o_mlops.options import MonitoringOptions, Column, BaselineData, NumericalAggregate, \
MissingValues
from h2o_mlops.types import ColumnLogicalType
options = MonitoringOptions(
enabled=True,
input_columns = [
Column(
name="age",
logical_type=ColumnLogicalType.NUMERICAL
),
],
output_columns = [
Column(
name="quantity",
logical_type=ColumnLogicalType.NUMERICAL,
is_model_output=True
)
],
baseline_data = [
BaselineData(
column_name="AGE",
logical_type=ColumnLogicalType.NUMERICAL,
numerical_aggregate=NumericalAggregate(
bin_edges=[
float("-inf"),
22.0,
23.0,
25.0,
26.0,
28.0,
30.0,
31.0,
float("inf"),
],
bin_count=[0, 1, 3, 1, 2, 2, 3, 3],
mean_value=27.266666666666666,
standard_deviation=3.2396354880199243,
min_value=22.0,
max_value=31.0,
sum_value=409.0,
),
categorical_aggregate=None,
missing_values=MissingValues(row_count=0),
),
]
)
Automatic configurationβ
You can also configure monitoring automatically. This method can calculate the baseline using PySpark.
PySpark is required for this step.
from h2o_mlops.utils.monitoring import prepare_monitoring_options_from_data_frame
from h2o_mlops.types import ColumnLogicalType
session = h2o_mlops.utils.monitoring.get_spark_session()
baseline_data_frame = h2o_mlops.utils.monitoring.read_source(
spark=session,
source_data="file:///datasets/categorical_data.csv",
source_format=h2o_mlops.utils.monitoring.Format.CSV,
)
### User is able to override logical type for column for example ID column
logical_type_overrides = {
"id": ColumnLogicalType.ID
}
# Experiment is optional and base on schema in experiment code is able to discover proper types for monitoring
options = prepare_monitoring_options_from_data_frame(
data_frame=baseline_data_frame,
logical_type_overrides = logical_type_overrides,
experiment=experiment,
)
options.enabled = True
Step 2: Optional: Kafka integration for raw scoring logsβ
You can enable the export of raw scoring request and response data to Kafka, if it is enabled in the environment. You can use a global topic or specify a custom topic per deployment.
options.kafka_topic = "test"
Step 3: Edit baseline and columns before deploymentβ
You can modify the automatically detected baseline and monitored columns before deployment if the detection was inaccurate.
To modify the logical type of an existing column:
options.input_columns[0].logical_type = ColumnLogicalType.CATEGORICAL
To replace an entire column definition:
options.input_columns[0] = Column(
name="width",
logical_type=ColumnLogicalType.NUMERICAL
)
Step 4: Configure monitoring for deploymentβ
You can deploy a model with monitoring enabled, or enable or disable monitoring after deployment.
Deploy with monitoring enabledβ
To deploy with monitoring enabled:
deployment = project.deployments.create(
name = "demo-deployment",
composition_options=[comp_opts],
mode=DeploymentModeType.SINGLE_MODEL,
monitoring_options=options,
security_options=sec_opt
)
Enable or disable monitoring after deploymentβ
You can enable or disable monitoring after deployment as long as the monitored columns were provided. If they werenβt, you must configure them first with the monitoring_options
configuration.
To disable monitoring if it was already configured:
options = deployment.monitoring_options
options.enabled = False
(monitoring_options=options)
To enable monitoring when it wasnβt configured at deployment time:
First, define the monitored columns using manual or automatic configuration.
For more information, see Step 1: Define input and output columns. Then:
options = deployment.monitoring_options
options.enabled = True
deployment.update(monitoring_options=options)
- Submit and view feedback for this page
- Send feedback about H2O MLOps to cloud-feedback@h2o.ai