Version: v0.70.7

Migration guide

From 0.69.x to 0.70.0

Transition from Scoring Client to native batch scoring

Starting from H2O MLOps version 0.70, batch scoring functionality has been natively integrated into H2O MLOps, replacing the H2O MLOps Scoring Client. The native batch scoring implementation is available through the official H2O MLOps Python client.

For added convenience, batch scoring can also be performed through the new H2O MLOps UI.

Workload identity and IAM authentication

Starting from H2O MLOps version 0.70, workload identity and IAM authentication will be managed using the github.com/h2oai/go-pkg/database/postgres/v2 library for the mlops-storage, mlops-telemetry, and mlops-deployer components.

Update the connection strings for these components to match the formats shown in the examples below:

Example of the mlops-storage and mlops-telemetry database connection string:

storage_db_connection_string = "postgres://${var.mlops_db_username}@${var.mlops_db_address}:5432/${var.mlops_storage_db}?aws_iam_auth_enabled=true&aws_iam_auth_region=${var.aws_region}&aws_iam_auth_user=${var.mlops_db_username}&aws_iam_auth_endpoint=${var.mlops_db_address}:5432"

Example of the mlops-deployer database connection string:

deployment_db_connection_string = "postgres://${var.mlops_deployment_db_address}/${var.mlops_deployment_db_name}?sslmode=${var.db_connection_ssl_mode}&user=${urlencode(var.mlops_deployment_db_username)}&password=${urlencode(var.mlops_deployment_db_password)}"

Removal of mTLS

mTLS is no longer managed by Kubernetes jobs. For environments requiring mTLS communication between MLOps services, this should now be handled by a service mesh solution (such as Istio).

Previous versions used SPIFFE for service-to-service authentication. Version 0.70+ now uses service account tokens instead.

Migration Steps

If your deployment requires mTLS:

Remove any existing Kubernetes job configurations for mTLS.
Implement a service mesh solution to manage mTLS between services.
Configure your service mesh to handle the TLS certificate management.

Changes in Helm

Config tls has been removed.
Config storage.tls has been removed.
Config deployer.tls has been removed.
Config ingest.tls has been removed.
Config apiGateway.tls has been removed.
Config monitoringAppBackend.tls has been removed.
Config deployer.telemetry.auth.tlsEnabled has been removed.
Config telemetry.serverSecurityEnabled has been removed.

Config storage.auth.service has been added:

service:
  # -- Issuer URL for service authentication.
  # In general, this should be set to whatever the "issuer" field is in the cluster's
  # OIDC discovery document.
  # `kubectl get --raw /.well-known/openid-configuration` can be used to retrieve
  # the discovery document.
  issuerURL: "https://kubernetes.default.svc"

  # -- Configures whether to validate issuer URL for service authentication.
  # Issuer of some service account variants does not need to be the issuer
  # specified by the issuerURL.
  validateIssuer: false

  # -- Configures whether to use the Kubernetes HTTP client with TLS and token the issuer discovery
  # and downloading the signing keys.
  # Disable this when the issuer is not Kubernetes API server.
  useKubernetesHTTPClient: true

Removal of support for older H2O Driverless AI versions

In MLOps version 0.70, support for H2O Driverless AI versions 1.10.6.3 and earlier has been discontinued. This change affects the following versions:

1.10.5-cuda11.2.2
1.10.5.1-cuda11.2.2
1.10.6-cuda11.2.2
1.10.6.1-cuda11.2.2
1.10.6.2-cuda11.2.2
1.10.6.3-cuda11.2.2

Removal of `Pickle` Runtime

Starting from H2O MLOps version 0.70, the Pickle Runtime has been removed.

Removal of `environment` from Python Client and UI

Starting with MLOps version 0.70.0, the environment feature has been removed from the user perspective in both the Python client and the UI. This change does not apply to the backend, and environment-related functionalities remain intact.

Changes in the UI

Users no longer need to select an environment (e.g., PROD or DEV) when creating a deployment.
The environment now defaults internally to PROD.
Environment-related details are no longer visible in the UI.

Changes in the Python Client

The environments property of the MLOpsProject instance is no longer available starting from client version 1.3.0.
The environment now defaults internally to PROD.
When using the updated client, the following adjustments must be made in the code. Here, project refers to an instance of MLOpsProject, and client refers to an instance of h2o_mlops.Client:

Code Adjustments:
- Replace project.environments.get(uid).deployments with project.deployments.
- Replace project.environments.get(uid).endpoints with project.endpoints.
- Replace project.environments.get(uid).allowed_affinities with client.allowed_affinities.
- Replace project.environments.get(uid).allowed_tolerations with client.allowed_tolerations.
With these adjustments, your code will remain compatible with both the updated and older versions of the MLOps backend.

From 0.68.x to 0.69.0

MLOPs runtimes

All runtime images must be updated to at least version 1.5.3, which was released with H2O MLOps version 0.69.0

Starting with version 0.69.0 all runtime images must be always updated to the runtime images released with that corresponding H2O MLOPs version. For example, H2O MLOps 0.69.1 was released with runtime images v1.5.4, and therefore all deployment's images must be updated to runtimes v1.5.4.

MLOps storage

Starting with MLOps version 0.69.0, only blob storages are supported as the backend. Support for other storage options has been discontinued. This change impacts the configuration parameters used in the MLOps Helm charts.

Changes to storage configuration parameters

storage:
  persistence:
    # All parameters under this section are no longer supported.
  cloudPersistence:
    # The 'enabled' parameter was removed since cloudPersistence is now the only option supported.
    enabled: 
  pvcMigration:
    # All parameters under this section are no longer supported.

note

After upgrading to MLOps 0.69.0, you can safely delete your existing PVC that was used as the storage backend prior to the 0.68.0 release. Perform this step manually to prevent any unintended data loss.

PBKDF2 hash support

H2O MLOps v0.69.0 now supports the PBKDF2 passphrase hash algorithm for more secure hashing. Note the following details:

The PBKDF2 hash should follow the format pbkdf2:<hashFunc>:<iterations>$<salt>$<hash>.
The salt and hash should be base64 encoded.
PBKDF2 hashing replaces bcrypt when creating deployments with the Passphrase (Stored hashed) security option.
The Passphrase (Stored hashed) security option is listed as an available option in the Create Deployment panel dropdown only if PASSPHRASE_HASH_TYPE_PBKDF2 is included under securityOptions.activated in the values.yaml. Having PASSPHRASE_HASH_TYPE_BCRYPT is neither sufficient nor required.
Older deployments created with bcrypt hashing remains accessible without requiring any additional configuration.

From 0.67.x to 0.68.0

(Optional) Vertical Pod Autoscaler (VPA) support

MLOps version 0.68.0 introduces Vertical Pod Autoscaler (VPA) support for the Deployer. Note that VPA activation is optional and performed upon request. VPA allows dynamic scaling of CPU and memory resources based on application usage, improving resource efficiency and optimizing costs.

If the VPA is activated in MLOps, then VPA is supported in the cluster and the VPA CRDs and controllers are up and running alongside the Metrics Server.

For more information, see the Installation section of the VPA GitHub README and the Metrics Server installation instructions.

Note: For a list of known limitations, see the Known limitations section of the VPA GitHub README.

Key changes

VPA Resource Specifications: Added VPA resource specification logic to the Scoring Apps and App Composer, allowing for the dynamic adjustment of their resource limits based on real-time demand.
API Updates: New API logic has been added for specifying and validating VPA resources.
New VPA Utility Functions: Implemented utility methods for creating and managing VPA resources, including validation and resource quantity handling.
Deprecated Function Removal: Removed the deprecated Fabric8 createOrReplace usage in the Scoring Apps.

Removal of HT runtime based on Python 3.8

The Hydrogen Torch (HT) runtime based on Python 3.8, which was available by default in MLOps version 0.67.x, has been removed as of MLOps version 0.68.0. However, you can still use this runtime by registering it through extra runtimes.

The following requirements need to be met so that the runtime registered through extra runtimes is also visible in the UI:

The mlflow/flavors/python_function/loader_module must match mlflow.pyfunc.model.
The runtime name must adhere to this pattern: (python-scorer_hydrogen_torch_)(\w*)(38)(\w*).

Configure maximum number of Kubernetes replicas

With H2O MLOps v0.68.0, you can configure the maximum number of Kubernetes replicas that can be specified when creating a new deployment. To do this, update maxDeploymentReplicas in the values.yaml file (charts/mlops/values.yaml). By default, the maxDeploymentReplicas value is set to 5.

Removal of MLflow runtimes based on Python 3.8

MLflow runtimes based on Python 3.8 have been removed in MLOps version 0.68.0. Python 3.8 has officially reached end of life as of October 07, 2024.

Pickle runtime based on Python 3.12

MLOps version 0.68.0 introduces a pickle runtime using Python 3.12. Choose one of the following options:

Update your models to work with Python 3.12.
If you cannot update your models, the original pickle runtime based on Python 3.8.18 can be configured during MLOps installation by replacing the pickle-3.12.7 image with pickle-3.8.18.

Deployment of MLOps Telemetry as a long-running microservice

In MLOps version 0.67 and earlier, the MLOps telemetry component was configured as a cron job within the MLOps storage component in the Helm configuration. Starting with MLOps version 0.68, the MLOps telemetry component must be deployed as a separate long-running microservice that publishes event data at scheduled intervals.

To migrate from MLOps version 0.67 to 0.68:

Remove the cron job configuration from the MLOps storage component in the Helm configuration.
Implement it as a separate telemetry component within Helm.

Helm values must be set as follows:

# Telemetry Configrations
telemetry = {
  enabled = true
  image = {
    repository = "h2oai-modelstorage-telemetry${local.shared_services_repository_suffix}"
    tag        = local.component_version.mlops_telemetry_version
  }
  replicaCount = 1
  nodeSelector = {
    "hac.h2o.ai/provisioner" = "karpenter"
  }
  tolerations = [
    {
      key      = "type"
      operator = "Equal"
      value    = "cpu-consolidation"
      effect   = "NoSchedule"
    }
  ]
  podSecurityContext = {
    enabled = true
  }
  containerSecurityContext = {
    enabled = true
  }
  serviceAccount = {
    name = "hac-mlops-storage-telemetry-service-account"
  }
  serverAddress = "hac-telemetry-service.telemetry.svc.cluster.local:80"
  config = {
    logLevel = "error"
  }
}

Scheduler routine for MLOps Telemetry

MLOps version 0.68.0 introduces the SCHEDULER_INTERVAL_SECONDS env variable to run scheduler routine inside the application itself, replacing the use of a cron job. As a result, MLOps Telemetry is deployed as a long-running deployment in the K8s cluster that publishes event data at scheduled intervals. The default value is as follows:

SCHEDULER_INTERVAL_SECONDS=300

Restructured environment security options

Environment-related security options are now configured in a different way. Prior to v0.68.0, security options were specified using their corresponding numerical values. For example:

securityOptions: [1,2,3]

From v0.68.0 onwards, activated security options are configured in the values.yaml file (charts/mlops/values.yaml) using the security option name. For example:

securityOptions:
    activated:
        - .......
        - "AUTHORIZATION_PROTOCOL_OIDC"
        - .......

You can also set the default security option in the values.yaml file (charts/mlops/values.yaml) using the security option name. The default option serves as the default security setting that will be applied in the UI when creating a deployment and it must be a part of the Activated Security Options List.

securityOptions:
    activated:
        - .......
        - "PASSPHRASE_HASH_TYPE_PLAINTEXT"
        - .......
    default: "PASSPHRASE_HASH_TYPE_PLAINTEXT"

The following security options are supported in v0.68.0:

DISABLED: No security options are activated.
PASSPHRASE_HASH_TYPE_PLAINTEXT: Passphrase hash type is plaintext.
PASSPHRASE_HASH_TYPE_BCRYPT: Passphrase hash type is bcrypt.
AUTHORIZATION_PROTOCOL_OIDC: OIDC authorization protocol is activated.

Notes

The Activated Security Options List can not be empty.
The default option must be part of the Activated Security Options List.

From v0.68.0 onwards, the way to create a deployment with No Security via API call also differs from previous versions. This change includes the following modifications to the h2o-mlops Python Client:

security_options is now a required field for the create_single method of the MLOpsScoringDeployments class.
To ensure backward compatibility, v0.68.0 includes a new attribute for the SecurityOptions class, called disabled_security. This attribute allows handling cases with the No Security option by setting it to True, instead of treating None or SecurityOptions() as No Security.
Users of MLOps assembly v0.68.0 or above must set disabled_security=True to use the No Security option. For users on older versions, No Security mode can be accessed by using SecurityOptions with default values.

Helm changes

As of version 0.68.0, the ENABLE_USER_EXTERNALID_UPDATE environment variable has been removed from storage, as it is no longer necessary.
deploymentEnvironment.corsOrigin has been removed. Use global.cors.allowedOrigin instead.

Default deployment security option

As of version 0.68.0, the default security option for deployment is PASSPHRASE_HASH_TYPE_PLAINTEXT. Prior to this version, deployments were not secured by default.

Cloud migration information: MLOps storage

Starting with version 0.68.0, H2O MLOps will no longer support PVCs for storage, transitioning instead to cloud blob storage. MLOps storage will support blob storage from all three major cloud providers—AWS, Azure, and GCP—as well as Minio for on-premises installations. Consequently, all existing data must be migrated from PVC to blob storage during the upgrade to MLOps 0.68.0. All the data migrations steps will be taken care of by MLOps when MLOps storage is deployed in the MIGRATE mode and no manual user intervention is needed. End users shouldn't experience any down time or data loss while the migration is in progress.

Installation instructions

Deploy storage in MIGRATE mode

Note: Only follow the instructions in this section if MLOps storage was previously deployed with LOCAL mode using a Kubernetes PVC as the storage.

For AWS environments with S3

IAM auth is used to access the bucket. Following annotation should be set to the storage service account.

eks.amazonaws.com/role-arn: <iam-role-arn>

storage:
  serviceAccount:
    create: true
    annotations: {
      eks.amazonaws.com/role-arn: <iam-role-arn>
    }
  persistence:
    enabled: true
  cloudPersistence: 
    enabled: true
    url: s3://<bucket-name>?region=<bucket-region>&prefix=<optional-prefix>
  pvcMigration:
    enabled: true
    cloudProvider: s3
    bucketName: <bucket-name>
    region: <bucket-region>
    prefix: <optional-prefix>

For GCP environments with Google Cloud Storage

Workload identify is used to access the bucket. The following annotation must be set to the storage service account:

iam.gke.io/gcp-service-account: <service_account_email>

Helm values must be set as follows:

storage:
  serviceAccount:
    create: true
    annotations: {
      iam.gke.io/gcp-service-account: <service_account_email>
    }
  persistence:
    enabled: true
  cloudPersistence: 
    enabled: true
    url: gs://<bucket-name>
  pvcMigration:
    enabled: true
    cloudProvider: gcs
    bucketName: <bucket-name>
    region: <bucket-region>

For Azure environments with Azure Blob Storage

Workload identify is used to access the bucket. The following annotation must be set to the storage service account:

azure.workload.identity/client-id=<client-id>

The following label must be set to storage pods (service and migrator job):

azure.workload.identity/use=true

Helm values must be set as follows:

storage:
  serviceAccount:
    create: true
    annotations: {
      azure.workload.identity/client-id=<client-id>
    }
  extraPodLabels: {
    azure.workload.identity/use=true
  }
  persistence:
    enabled: true
  cloudPersistence: 
    enabled: true
    url: azblob://<bucket-name>
  pvcMigration:
    enabled: true
    cloudProvider: azureblob
    bucketName: <bucket-name>
    region: <bucket-region>
    accountName: <storage-account-name>

For on-premise environments with Minio

storage:
  persistence:
    enabled: true
  cloudPersistence: 
    enabled: true
    url: s3://<minio-bucket-name>?endpoint=<minio-url>&region=<minio-region>&hostname_immutable=true
    access_key_id: <minio-access-key-id>
    secret_access_key: <minio-secret-access-key>
  pvcMigration:
    enabled: true
    cloudProvider: minio
    bucketName: <bucket-name>
    region: <minio-region>
    endpoint: <minio-url>
    access_key_id: <minio-access-key-id>
    secret_access_key: <minio-secret-access-key>

From 0.66.1 to 0.67.0

Announcement: Upcoming Java MOJO Runtime removal

The Java MOJO Runtime will be removed in the 0.69.0 MLOps release. Version 0.68.0 will be the last release to include the Java MOJO Runtime.

Users are advised to migrate to the C++ MOJO Runtime, which is a 1:1 mapping of the Java runtime that accepts a wider range of algorithms Driverless AI may use that the Java runtime does not support, including BERT, GrowNet, and TensorFlow models.

Scoring runtimes

MLflow Runtimes images are twice as large now. This means that deployments of these run-times can take longer due to longer pulling times.
Runtimes for DAI 1.10.4.3 and older are removed as of MLOps version 0.67.0.
MLflow runtimes support Python 3.8 and later starting with MLOps version 0.67.0.

For more information on scoring runtimes in H2O MLOps, see Scoring runtimes.

Python client

Starting with version 0.67.0, the official Python client of H2O MLOps is h2o-mlops. The minimum Python version required for the client is Python 3.9.

Built on top of the legacy Python client, h2o-mlops retains all previous functionalities. You can continue to access the legacy client's features through h2o-mlops as needed. For more information, see Python client tutorials - Backend.

Note that users of the legacy client can switch to the new Python client (h2o-mlops) by importing h2o-mlops before using any features of the legacy client. This switch can be made without needing to modify any existing code or import statements.

Removal of Conda from Wave app

With the removal of Conda as of MLOps version 0.67.0, third-party models can no longer be uploaded to the MLOps frontend using serialized Pickle files. However, you can still upload models from frameworks like scikit-learn, PyTorch, XGBoost, LightGBM, and TensorFlow using MLflow packaged files.

Monitoring data retention

Starting with version 0.67.0, per project data retention duration can be set for monitoring data stored on InfluxDB. To enable this feature, set the MONITOR_INFLUXDB_PER_PROJECT_DATA_RETENTION_PERIOD env to the deployer with a correct duration string. Minimum retention period is 1h and the max is INF. INF will be the default If MONITOR_INFLUXDB_PER_PROJECT_DATA_RETENTION_PERIOD is not set, INF is the default duration.

-monitor_influxdb_per_project_data_retention_period is exposed for H2O MLOps helm charts to set the MONITOR_INFLUXDB_PER_PROJECT_DATA_RETENTION_PERIOD for deployer.

Emissary

Switch from emissary to gateway-api:

Emissary's CRDs are no longer used.
For mapping deployments to http, Gateway API's HTTPRoute CRD is used.
Gateway API implemented with Envoy Gateway.
(Breaking change) Gateway API doesn't support custom error responses. This means that if a deployment is scaled down, the following custom error body is no longer displayed: Deployment is scaled down to zero replicas. Please increase the number of replicas to use the deployment. For more information, see Custom error responses.
(Breaking change) - If a deployment is scaled down, error code 500 is thrown instead of 503.

Other changes

External model registry is removed as of version 0.67.0.

Feedback

Submit and view feedback for this page
Send feedback about H2O MLOps to cloud-feedback@h2o.ai

From 0.69.x to 0.70.0​

Transition from Scoring Client to native batch scoring​

Workload identity and IAM authentication​

Removal of mTLS​

Migration Steps​

Changes in Helm​

Removal of support for older H2O Driverless AI versions​

Removal of Pickle Runtime​

Removal of environment from Python Client and UI​

Changes in the UI​

Changes in the Python Client​

From 0.68.x to 0.69.0​

MLOPs runtimes​

MLOps storage​

Changes to storage configuration parameters​

PBKDF2 hash support​

From 0.67.x to 0.68.0​

(Optional) Vertical Pod Autoscaler (VPA) support​

Key changes​

Removal of HT runtime based on Python 3.8​

Configure maximum number of Kubernetes replicas​

Removal of MLflow runtimes based on Python 3.8​

Pickle runtime based on Python 3.12​

Deployment of MLOps Telemetry as a long-running microservice​

Scheduler routine for MLOps Telemetry​

Restructured environment security options​

Helm changes​

Default deployment security option​

Cloud migration information: MLOps storage​

Installation instructions​

Deploy storage in MIGRATE mode​

For AWS environments with S3​

For GCP environments with Google Cloud Storage​

For Azure environments with Azure Blob Storage​

For on-premise environments with Minio​

From 0.66.1 to 0.67.0​

Announcement: Upcoming Java MOJO Runtime removal​

Scoring runtimes​

Python client​

Removal of Conda from Wave app​

Monitoring data retention​

Emissary​

Other changes​

From 0.69.x to 0.70.0

Transition from Scoring Client to native batch scoring

Workload identity and IAM authentication

Removal of mTLS

Migration Steps

Changes in Helm

Removal of support for older H2O Driverless AI versions

Removal of `Pickle` Runtime

Removal of `environment` from Python Client and UI

Changes in the UI

Changes in the Python Client

From 0.68.x to 0.69.0

MLOPs runtimes

MLOps storage

Changes to storage configuration parameters

PBKDF2 hash support

From 0.67.x to 0.68.0

(Optional) Vertical Pod Autoscaler (VPA) support

Key changes

Removal of HT runtime based on Python 3.8

Configure maximum number of Kubernetes replicas

Removal of MLflow runtimes based on Python 3.8

Pickle runtime based on Python 3.12

Deployment of MLOps Telemetry as a long-running microservice

Scheduler routine for MLOps Telemetry

Restructured environment security options

Helm changes

Default deployment security option

Cloud migration information: MLOps storage

Installation instructions

Deploy storage in MIGRATE mode

For AWS environments with S3

For GCP environments with Google Cloud Storage

For Azure environments with Azure Blob Storage

For on-premise environments with Minio

From 0.66.1 to 0.67.0

Announcement: Upcoming Java MOJO Runtime removal

Scoring runtimes

Python client

Removal of Conda from Wave app

Monitoring data retention

Emissary

Other changes