Batch scoring
Batch scoring is the process of making predictions on a large set of data all at once, instead of one-by-one in real time. This feature supports usage through both the UI and H2O MLOps Python client.
Batch scoring jobs in H2O MLOps create a dedicated Kubernetes runtime that reads data from an input source and stores the predicted results in an output location.
To run a batch scoring job, you must define the source of the input data and the location (sink) for the scored output.
H2O MLOps supports the following source and sink types:
- Azure Blob Storage
- Amazon S3
- Google Cloud Storage (GCS)
- MinIO
- JDBC
note
- JDBC tables, CSV files (without header), and JSON lines are supported as input.
- Output can be stored in CSV format (without header), JSON lines format, or written directly to a JDBC table.