Skip to main content
Version: Next

Batch scoring

Batch scoring is the process of making predictions on a large set of data all at once, instead of one-by-one in real time. This feature supports usage through both the UI and H2O MLOps Python client.

Batch scoring jobs in H2O MLOps create a dedicated Kubernetes runtime that reads data from an input source and stores the predicted results in an output location.

To run a batch scoring job, you must define the source of the input data and the location (sink) for the scored output.

H2O MLOps supports the following source and sink types:

  • Azure Blob Storage
  • Amazon S3
  • Google Cloud Storage (GCS)
  • MinIO
  • JDBC
note
  • JDBC tables, CSV files (without header), and JSON lines are supported as input.
  • Output can be stored in CSV format (without header), JSON lines format, or written directly to a JDBC table.

Batch scoring with the UI

This section describes how to start a batch scoring job using the H2O MLOps UI.

To batch score a model using the UI, follow these steps:

  1. From Manage projects, select the project that contains the model you want to batch score.

  2. In the left navigation bar, click Batch scoring jobs.

  3. Click Start new job.

  4. On the Start new job page, enter a name for the batch scoring job in the Job name field.

  5. Select the model from the Model drop-down menu.

  6. Choose the artifact type and runtime from the Artifact type and runtime drop-down menu.

  7. Under Advanced settings, configure the batch size and Kubernetes options, such as the number of replicas and resource requests and limits.

    start batch scoring job

  8. Specify the source and sink configuration.

    Select the appropriate spec type (for example, S3 Spec) from the Source spec drop-down menu and fill out the configuration fields.

    Source spec

    note

    The MinIO specification uses the same configuration fields as the S3 specification. To select MinIO as the source spec type, choose S3 spec from the source spec drop-down menu.

    For S3 Spec, provide the following details:

    • accessKeyID (required): The unique identifier for AWS authentication. Not required for public S3 buckets.
    • secretAccessKey (required): The private password for AWS authentication. Not required for public S3 buckets.
    • sessionToken: The temporary security token for time-limited access to AWS resources.
    • pathStyle: Select this option to enable path-style URL construction for the S3 bucket.
    • region (required): The AWS geographical region where resources or services will be accessed.
    • endpoint: The custom URL to override default AWS service endpoint for specialized configurations.
    • partSize: The size of each partition in bytes for reading data.
    • Source MIME type (required): The MIME type (media type) of the input data. Select an appropriate option from the drop-down menu.
    • Source location (required): The path to the input data source.

    Now, select the appropriate spec type (for example, S3 Spec) from the Sink spec drop-down menu and fill out the configuration fields.

    Sink spec

    note

    The MinIO specification uses the same configuration fields as the S3 specification. To select MinIO as the sink spec type, choose S3 spec from the sink spec drop-down menu.

    For S3 Spec, provide the following details:

    • accessKeyID (required): The unique identifier for AWS authentication.
    • secretAccessKey (required): The private password for AWS authentication.
    • sessionToken: The temporary security token for time-limited access to AWS resources.
    • pathStyle: Select this option to enable path-style URL construction for the S3 bucket.
    • region (required): The AWS geographical region where resources or services will be accessed.
    • endpoint: The custom URL to override default AWS service endpoint for specialized configurations.
    • writeConcurrency: The number of concurrent write operations.
    • Sink MIME type (required): The MIME type (media type) of the output data. Select an appropriate option from the drop-down menu.
    • Sink location (required): The destination path where the output data will be written.
  9. After filling out the configuration fields, click Start job to initiate the batch scoring job.

Batch scoring with Python client

To learn how to perform batch scoring using the H2O MLOps Python client, see the Batch scoring tutorial in the Python client tutorials section.


Feedback