Skip to main content
Version: v0.66.1

Scoring Pandas Data Frames

Do parallelized scoring of a data frame in mini-batches against a MLOps deployment.

Prerequisites

Setup

Install the h2o_mlops_scoring_client with pip.

Example Usage

  1. Import libraries:
import h2o_mlops_scoring_client
import pandas
  1. Choose the MLOps scoring endpoint:
MLOPS_ENDPOINT_URL = "https://model.internal.dedicated.h2o.ai/d4d36117-c94a-4182-8b75-5f5abbd1c28b/model/score"
  1. Get a data frame to use along with a unique ID column used to identify each score.
DATA_FRAME = pandas.read_csv("/Users/jgranados/datasets/BNPParibas.csv")
ID_COLUMN = "ID"
  1. And now we score.

Description of arguments for scoring:

  • mlops_endpoint_url: MLOps deployment scoring endpoint URL.
  • id_column: Name of column in data to be scored. Note that the column must contain unique row identifiers.
  • data_frame: Pandas or Spark data frame.
  • cpus: Number of CPU cores to use for scoring Pandas data frames. For best performance with Java MOJO deployments, set cpus to be four times the number of deployment replicas. For any type of deployment, setting cpus to more than four times the replicas will likely not bring additional benefit unless model monitoring is disabled. For slow scoring deployments like the DAI Python scoring pipeline, less than four times the replicas may increase throughput.
pandas_df = h2o_mlops_scoring_client.score_data_frame(
mlops_endpoint_url=MLOPS_ENDPOINT_URL,
id_column=ID_COLUMN,
data_frame=DATA_FRAME,
)

23/08/21 14:23:58 INFO h2o_mlops_scoring_client: Connecting to H2O.ai MLOps scorer at 'https://model.internal.dedicated.h2o.ai/d4d36117-c94a-4182-8b75-5f5abbd1c28b/model/score' 23/08/21 14:23:59 INFO h2o_mlops_scoring_client: Starting scoring data frame 23/08/21 14:25:11 INFO h2o_mlops_scoring_client: Scoring complete 23/08/21 14:25:11 INFO h2o_mlops_scoring_client: Total run time: 0:01:13 23/08/21 14:25:11 INFO h2o_mlops_scoring_client: Scoring run time: 0:01:12

  1. Optionally merge the scores into the original data frame.
DATA_FRAME.merge(pandas_df, on=ID_COLUMN)
IDtargetv1v2v3v4v5v6v7v8...v124v125v126v127v128v129v130v131target.0target.1
0311.3357398.727474C3.9210267.9152662.5992783.1768950.012941...0.035754AU1.8041263.1137192.02428500.6363652.8571440.1167700.883230
141NaNNaNCNaN9.191265NaNNaN2.301630...0.598896AFNaNNaN1.9578250NaNNaN0.2984350.701565
2510.9438775.310079C4.4109695.3261593.9795923.9285710.019645...0.013452AE1.7737093.9221931.12046820.8831181.1764720.1543900.845610
3610.7974158.304757C4.22593011.6274382.0977001.9875490.171947...0.002267CJ1.4152302.9543811.99084711.6771081.0344830.0425050.957495
481NaNNaNCNaNNaNNaNNaNNaN...NaNZNaNNaNNaN0NaNNaN0.0576250.942375
..................................................................
1143162287081NaNNaNCNaNNaNNaNNaNNaN...NaNALNaNNaNNaN0NaNNaN0.1081650.891835
1143172287101NaNNaNCNaNNaNNaNNaNNaN...NaNENaNNaNNaN1NaNNaN0.0383740.961626
1143182287111NaNNaNCNaN10.069277NaNNaN0.323324...0.156764QNaNNaN2.4176062NaNNaN0.0539580.946042
1143192287121NaNNaNCNaN10.106144NaNNaN0.309226...0.490658BWNaNNaN3.5266500NaNNaN0.2207660.779234
11432022871311.6197637.932978C4.6400858.4731412.3514702.8267663.479754...3.135205V1.9431494.3855531.60449301.7876101.3861380.1290880.870912

114321 rows × 135 columns


Feedback