Skip to content

Deploying a model with Kubernetes resource and replica requests

This example demonstrates how you can upload a model to MLOps and deploy it within a Kubernetes cluster by defining the resources allocated to that model and the number of replicas you need to deploy.

  • Resources: Resource requests can be used to specify the amount of memory or CPU a model is expected to use and define the limit where memory or CPU usage is considered beyond the expected amount. This is useful for the underlying Kubernetes scheduler when deciding whether a model should be evicted due to overconsumption of resources.

  • Replicas: The number of concurrent pods that will be associated with the model. This specifies the number of concurrent instances of the same model you need to deploy. Concurrent models can decrease scoring times by balancing simultaneous scoring requests across multiple model replicas.


For more information about Kubernetes resources and replicas, see Resource Management for Pods and Containers and Deployments in Kubernetes.

Before you begin

  • You will need a model zip file to upload and deploy in MLOps. This can be any type of model supported by MLOps. For this example, you can use this model.

  • You will need the values for the following constants in order to successfully carry out the task. Contact your administrator to obtain deployment specific values.

Value Description
MLOPS_API_URL Usually: Defines the URL for the MLOps Gateway component. You can verify the correct URL by navigating to the API URL in your browser. It should provide a page with a list of available routes.
TOKEN_ENDPOINT_URL https://mlops.keycloak.domain/auth/realms/[fill-in-realm-name]/protocol/openid-connect/token Defines the token endpoint URL of the Identity Provider. This uses Keycloak as the Identity Provider. Keycloak Realm should be provided.
REFRESH_TOKEN <your-refresh-token> Defines the user's refresh token
CLIENT_ID <your-client-id> Sets the client id for authentication. This is the client you will be using to connect to MLOps.
MODEL_FILE_PATH <path-to-your-model-zip-file> Defines a model zip file that will be uploaded and deployed. This can be any type of model supported by MLOps.
MODEL_DISPLAY_NAME DeployWithKubernetesResourcesAndReplicas Defines the model's display name.
PROJECT_NAME UploadAndDeployWithResources Defines a project that the script will create for the model.
DEPLOYMENT_ENVIRONMENT DEV Defines the target deployment environment.
REFRESH_STATUS_INTERVAL 1.0 Defines a refresh interval for the deployment health check.
MAX_WAIT_TIME 300 Defines maximum waiting time for the deployment to become healthy.

The following steps demonstrate how you can use the MLOps Python client to upload a model to MLOps and deploy it within a Kubernetes cluster by defining resources and replicas associated with it.

  1. Download the file.

  2. Change the values of the following constants in your file as given in the preceding data table.
        ### Constants
        ### Constants
        MLOPS_API_URL = ""
        MODEL_FILE_PATH = "<local-path-to-your-model-zip-file>"
        MODEL_DISPLAY_NAME = "DeployWithKubernetesResourcesAndReplicas"
        PROJECT_NAME = "UploadAndDeployWithResources"
        MAX_WAIT_TIME = 300
  3. Run the file.

    Deployment has become healthy 
  4. Finally, navigate to MLOps and click the project name UploadAndDeployWithResources under Projects to view the deployed model.

    Deploy with Kubernetes resources and replicas example


    For more information about model deployments in MLOps, see Understanding deployments in MLOps.

Example walkthrough

This section provides a walkthrough of each of the sections in the file.

  1. Include the Helper function, which waits for the deployment to be healthy.

  2. Convert the extracted metadata into storage compatible value objects.

  3. Set up the token provider using an existing refresh token.

  4. Set up the MLOps client.

  5. Create a project in MLOps and create an artifact in MLOps storage.

  6. Upload the zip file and analyze it.
    # Uploading the zip file.
    with open(MODEL_FILE_PATH, "rb") as mojo_file:
    # Analyzing the zip file.
    ingestion: mlops.IngestMetadata = mlops_client.ingest.model.create_model_ingestion(
    model_metadata = convert_metadata(ingestion.model_metadata)
    model_params = mlops.StorageExperimentParameters(
  7. Create an experiment from the zip file and link the artifact to the experiment.
    # Creating an experiment from the zip file.
    experiment: mlops.StorageExperiment = (
    # Linking the artifact to the experiment.
    artifact.entity_id =
    artifact.type = ingestion.artifact_type
            artifact=artifact, update_mask="type,entityId"
  8. Fetch the available deployment environments and search for the ID of the selected deployment environment.

  9. Customize the composition of the deployment.
    composition = mlops.DeployDeploymentComposition(,,


    Ensure the parameters here are correct, depending on the type of model you want to deploy.

    • deployable_artifact_type_name - This can be dai/scoring_pipeline, dai/mojo_pipeline, python/, etc.
    • artifact_processor_name - This depends on the deployable_artifact_type_name. unzip_processor is the standard for this, but for example, dai/scoring_pipeline has a unique processor.
    • runtime_name - Runtimes are dependent on the type of artifact being deployed. Dependencies vary for different models.

    For more information, see Artifact type and runtime section in Deploying a model.

  10. Define the Kubernetes resource requirements and the number of replicas you need to deploy.
    to_deploy = mlops.DeployDeployment(,
                    requests={"cpu": "200m", "memory": "256Mi", "": "1"},
                    limits={"cpu": "300m", "memory": "512Mi", "": "1"}


    kubernetes_resource_spec and its nested objects define the resource requests and their limits, and replicas.

    This deployment is requesting a deployment with 2 pods, each with a minimum resource allocation of 200m CPU, 256Mi, and 1 NVIDIA GPU Memory. If pod resource consumption exceeds 300m CPU or 512Mi Memory or 1 NVIDIA GPU, the pod will risk being restarted.

    For more information, see Resource Management for Pods and Containers is a good example of an arbitrary resource request. MLOps only expects cpu and memory to be defined, but will accept any other string if provided. If the Kubernetes scheduler is unaware of the resource requested, the model deployment will fail.

  11. Finally, create the deployment and wait for the deployment to become healthy. This analyzes and sets the metadata and parameters of the model, and then deploys it to the DEV environment.