Version: v0.66.1

Replica scaling

This page describes how to use the H2O MLOps Python client to manage hardware costs by scaling down a deployment when it's not in use.

Connect to H2O MLOps.

import h2o_mlops
import time


mlops = h2o_mlops.Client()

Retrieve a deployment and view its Kubernetes options.

project = mlops.projects.list()[0]
environment = project.environments.list()[0]
deployment = environment.deployments.list()[0]

print(deployment.kubernetes_options)

Output:

replicas: 1
requests: {'cpu': '1', 'memory': '1Gi'}
limits: {}
affinity: 
toleration: 

The following example demonstrates how to scale down to 0 replicas so that no hardware resources are used by the deployment. When completely scaled down, replicas will always display -1.

deployment.update_kubernetes_options(
   replicas=0
)

print(deployment.kubernetes_options)

Output:

replicas: -1
requests: {'cpu': '1', 'memory': '1Gi'}
limits: {}
affinity: 
toleration: 

Confirm that the update is complete by checking if the deployment is "healthy" (note that a deployment scaled down to 0 replicas is still considered "healthy"). Then for demonstration purposes, try connecting to one of the deployment endpoints using the get_capabilities method. This returns an HTTP error as the deployment is now completely scaled down.

while not deployment.is_healthy():
   deployment.raise_for_failure()
   time.sleep(5)

try:
   deployment.get_capabilities()
except Exception as e:
   print(e)

Output:

Server error '504 Gateway Timeout' for url 'https://model.internal.dedicated.h2o.ai/>1bde3840-8076-4945-b77a-e0844487f2f2/model/capabilities'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/504

Scale back up and confirm that the endpoints are working again by using the get_capabilities method.

Input:

deployment.update_kubernetes_options(
   replicas=1
)

print(deployment.kubernetes_options)

Output:

replicas: 1
requests: {'cpu': '1', 'memory': '1Gi'}
limits: {}
affinity: 
toleration: 

Input:

while not deployment.is_healthy():
   deployment.raise_for_failure()
   time.sleep(5)
   
deployment.get_capabilities()

Output:

['SCORE', 'CONTRIBUTION_ORIGINAL', 'CONTRIBUTION_TRANSFORMED']

Feedback

Submit and view feedback for this page
Send feedback about H2O MLOps to cloud-feedback@h2o.ai