Skip to content

Deploy a model

Note

Experiments that are imported into MLOps must first be registered before being deployed. For more information, see Model registry.

The following steps describe how to deploy a model version in MLOps.

  1. In the main navigation, click Projects.

    Projects page

  2. Select the project that contains the model you want to deploy.

  3. On the project page, click the Deployments tab, then click Create Deployment.

  4. In the Create a new deployment panel, specify the following:

    1. Deployment Type: You can choose to deploy a model to score requests in either real time with a REST endpoint or in batches. Note that batch deployments are currently available through H2O eScorer for MOJO artifacts only. H2O eScorer needs to be installed first, and the eScorer Model Registry is separate from the H2O MLOps Model Registry.

      Deployment type

    2. Enter a name and, optionally, a description for the deployment.

      Deployment name and description

    3. Deployment mode: To specify the Registered Model you want to deploy, click the Select Model button. Note that currently, only Single Model deployments are available through the UI. You can create A/B and Champion Challenger deployments with the H2O MLOps Python client.

      Deployment mode

    4. Select the Environment for the deployment.

      Deployment environment

    5. Artifact type and runtime: Select an artifact type and runtime for the deployment. Each entry in the Artifact Type drop-down represents an artifact that is linked to the selected experiment and is deployable by MLOps. Note that admins have the option to configure and add additional runtimes.

      The selection of available runtimes is determined by the artifact type that you specify. The following list provides information on the available options when selecting an artifact type and runtime.

      Warning

      Selecting an incorrect runtime causes the deployment to fail.

      Artifact type Version Runtime option Notes
      Driverless AI MOJO pipeline DAI 1.9.3 and later DAI MOJO Scorer (Shapley none)
      Driverless AI MOJO pipeline DAI 1.10.0 and later DAI MOJO Scorer (Shapley original only) Requires 2x the memory as no Shapley.
      Driverless AI MOJO pipeline DAI 1.9.3 and later DAI MOJO Scorer (Shapley transformed only) Requires 2x the memory as no Shapley.
      Driverless AI MOJO pipeline DAI 1.10.0 and later DAI MOJO Scorer (Shapley all) Requires 3x the memory as no Shapley.
      Driverless AI MOJO pipeline DAI 1.10.0 and later DAI MOJO Scorer (C++ Runtime) Experiment needs to be linked through project. Original Shapley requires DAI 1.10.3 and later. Transformed Shapley requires DAI 1.10.2 and later.
      Driverless AI Python scoring pipeline DAI 1.9.3 Python Pipeline Scorer [DAI 1.9.3] No longer supported.
      Driverless AI Python scoring pipeline DAI 1.10.0 and later Python Pipeline Scorer [DAI 1.10.0] and Python Pipeline Scorer [DAI 1.10.4.3] Python pipeline scorer’s version must correspond to the DAI version used to build the model (for example, a model built with DAI 1.10.4.2 must use Python Pipeline Scorer [DAI 1.10.4.2]).
      H2O-3 MOJO All versions H2O-3 MOJO Scorer
      MLflow / .pkl file MLflow Model Scorer [Python 3.8] and MLflow Model Scorer [Python 3.9] MLflow Model Scorer’s version must correspond to the Python version used to build the model. For a full list of supported third-party models, see Supported third-party models.

      Note

      The C++ MOJO2 runtime (DAI MOJO Scorer (C++ Runtime)) accepts a wider range of algorithms DAI may use that the Java runtime does not support, including BERT, GrowNet, and TensorFlow models. If you want to use one of these models, it must be linked from DAI and not be manually uploaded.

      Artifact type and runtime

    6. Endpoint security: Select one of the following levels of endpoint security.

      • No security (Default) - To skip configuration of endpoint security, select this option.

      • Passphrase (Stored as plain text) - The provided passphrase is stored in plain text in the database. If this option is selected, the passphrase is visible in the UI after the model is deployed.

      • Passphrase (Stored hashed) - The provided passphrase is stored as a hashed value in the database. If this option is selected, the passphrase is not visible in the UI after the model is deployed.

      Deployment endpoint security

  5. Kubernetes options: The following is a list of Kubernetes options that can be configured from the model deployment page. These options are based on various Kubernetes API resources, and can be used to improve the efficiency of the deployment. As a general rule, resources accepted by Kubernetes are also accepted in the MLOps configuration.

    • Replicas (optional): Specify the number of static replicas of the model you want to create. You can specify up to five replicas. Specifying multiple replicas is useful for load balancing and achieving high availability for scoring tasks.

      Note

      Each of the following Kubernetes options (resource requests, limits, node affinity, and toleration) are applied on a per-replica basis.

      Specify number of static replicas of the model

    • Resource requests and limits (optional): You can control the amount of resources an MLOps deployment uses with resource requests and limits.

      Resource requests determine the amount of resources the deployment asks Kubernetes to provide it. For example, if a replica requests 256Mi of memory, Kubernetes schedules the deployment on any node that has enough memory available to satisfy the stated requirement.

      Resource limits determine the maximum amount of resources that are available to a deployment. If a deployment exceeds a limit on the amount of memory allocated to it, the deployment is restarted. Note that the deployment does not restart if it exceeds a CPU limit.

      Additional fields for any existing custom resources that have been set up by an admin in your Kubernetes cluster can be added by clicking the Add row button.

      Note

      • The resource requests and limits fields must be defined using the quantity suffixes used in Kubernetes. The default values for Memory and CPU requests are 256Mi and 100m respectively. For more information, see Resource units in Kubernetes.

      • By default, resources are not limited.

      • When specifying custom resources, if Kubernetes is not able to satisfy the stated custom value(s), then the pod cannot be scheduled.

      • For more information on resource requests and limits in Kubernetes, see Resource Management for Pods and Containers.

      Resource requests and limits

    • Node affinity (optional): Specify a node affinity for the deployment. Note that admins have the ability to customize how node affinity is set up. For more information on node affinity, see Configuring node affinity and toleration and the official Kubernetes documentation on node affinity.

    • Toleration (optional): Specify a toleration for the scorer (pod). Note that admins have the ability to customize how toleration is set up. For more information on toleration, see Configuring node affinity and toleration and the official Kubernetes documentation on taints and tolerations.

      Node affinity and toleration

  6. Click Create Deployment.

Deploying NLP models

H2O MLOps supports the deployment of Driverless AI and MLflow natural language processing (NLP) models.

Driverless AI

To deploy a Driverless AI NLP model, refer to the following steps:

  1. In Driverless AI, pick a text dataset such as the Amazon Fine Food Reviews dataset or Airline Twitter Sentiment dataset.

  2. Train a DAI model following the NLP-specific steps described in the documentation on NLP experiments in DAI <https://docs.h2o.ai/driverless-ai/1-10-lts/docs/userguide/nlp.html>__.

  3. Link the NLP experiment to an MLOps project. For more information, see the Driverless AI documentation on linking experiments <https://docs.h2o.ai/driverless-ai/1-10-lts/docs/userguide/projects.html#link-experiments>__.

  4. Deploy as either a MOJO or a Python scoring pipeline. Note that using PyTorch or TensorFlow may make the MOJO unsupported.

  5. The scorer is available at the endpoint URL provided by MLOps. You can use curl to test the endpoint. To see how this might look, refer to the following example request:

    ububtu@ubuntu:/home/ubuntu$ curl -X POST -H "Content-Type: application/json" -d @- DEPLOYMENT_ENDPOINT_URL << EOF
    > {
    >   "fields": [
    >     "Description",
    >     "Summary"
    >   ],
    >   "rows": [
    >     [
    >       "text",
    >       "text"
    >     ]
    >   ]
    > }
    > EOF
    {"fields":["PositiveReview.0","PositiveReview.1"],"id":"1c2ec2f0-74c7-11ec-ad8e-3ee53b9e28aa","score":[["0.24813014","0.75186986"]]}nick@NPng-P5550:~/h2oworkspace/mlops-byom-images/python-scorer$
    

MLflow

To deploy an MLflow NLP model, refer to the following steps:

  1. Train your model.

  2. Wrap the model in an MLflow model. For more information, see the example on uploading an MLflow custom Python model.

  3. Upload and deploy the model using MLOps. For more information, see Adding experiments and Deploying a model.