Version: Next

Scoring Pandas Data Frames

Do parallelized scoring of a data frame in mini-batches against a MLOps deployment.

Prerequisites

h2o_mlops_scoring_client-*-py3-none-any.whl file or access to the Python Package Index (PyPI)

Setup

Install the h2o_mlops_scoring_client with pip.

Example Usage

Import libraries:

import h2o_mlops_scoring_client
import pandas

Choose the MLOps scoring endpoint:

MLOPS_ENDPOINT_URL = "https://model.internal.dedicated.h2o.ai/d4d36117-c94a-4182-8b75-5f5abbd1c28b/model/score"

Get a data frame to use along with a unique ID column used to identify each score.

DATA_FRAME = pandas.read_csv("/Users/jgranados/datasets/BNPParibas.csv")
ID_COLUMN = "ID"

And now we score.

Description of arguments for scoring:

mlops_endpoint_url: MLOps deployment scoring endpoint URL.
id_column: Name of column in data to be scored. Note that the column must contain unique row identifiers.
data_frame: Pandas or Spark data frame.
cpus: Number of CPU cores to use for scoring Pandas data frames. For best performance with Java MOJO deployments, set cpus to be four times the number of deployment replicas. For any type of deployment, setting cpus to more than four times the replicas will likely not bring additional benefit unless model monitoring is disabled. For slow scoring deployments like the DAI Python scoring pipeline, less than four times the replicas may increase throughput.

pandas_df = h2o_mlops_scoring_client.score_data_frame(
    mlops_endpoint_url=MLOPS_ENDPOINT_URL,
    id_column=ID_COLUMN,
    data_frame=DATA_FRAME,
)

23/08/21 14:23:58 INFO h2o_mlops_scoring_client: Connecting to H2O.ai MLOps scorer at 'https://model.internal.dedicated.h2o.ai/d4d36117-c94a-4182-8b75-5f5abbd1c28b/model/score' 23/08/21 14:23:59 INFO h2o_mlops_scoring_client: Starting scoring data frame 23/08/21 14:25:11 INFO h2o_mlops_scoring_client: Scoring complete 23/08/21 14:25:11 INFO h2o_mlops_scoring_client: Total run time: 0:01:13 23/08/21 14:25:11 INFO h2o_mlops_scoring_client: Scoring run time: 0:01:12

Optionally merge the scores into the original data frame.

DATA_FRAME.merge(pandas_df, on=ID_COLUMN)

	ID	target	v1	v2	v3	v4	v5	v6	v7	v8	...	v124	v125	v126	v127	v128	v129	v130	v131	target.0	target.1
0	3	1	1.335739	8.727474	C	3.921026	7.915266	2.599278	3.176895	0.012941	...	0.035754	AU	1.804126	3.113719	2.024285	0	0.636365	2.857144	0.116770	0.883230
1	4	1	NaN	NaN	C	NaN	9.191265	NaN	NaN	2.301630	...	0.598896	AF	NaN	NaN	1.957825	0	NaN	NaN	0.298435	0.701565
2	5	1	0.943877	5.310079	C	4.410969	5.326159	3.979592	3.928571	0.019645	...	0.013452	AE	1.773709	3.922193	1.120468	2	0.883118	1.176472	0.154390	0.845610
3	6	1	0.797415	8.304757	C	4.225930	11.627438	2.097700	1.987549	0.171947	...	0.002267	CJ	1.415230	2.954381	1.990847	1	1.677108	1.034483	0.042505	0.957495
4	8	1	NaN	NaN	C	NaN	NaN	NaN	NaN	NaN	...	NaN	Z	NaN	NaN	NaN	0	NaN	NaN	0.057625	0.942375
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
114316	228708	1	NaN	NaN	C	NaN	NaN	NaN	NaN	NaN	...	NaN	AL	NaN	NaN	NaN	0	NaN	NaN	0.108165	0.891835
114317	228710	1	NaN	NaN	C	NaN	NaN	NaN	NaN	NaN	...	NaN	E	NaN	NaN	NaN	1	NaN	NaN	0.038374	0.961626
114318	228711	1	NaN	NaN	C	NaN	10.069277	NaN	NaN	0.323324	...	0.156764	Q	NaN	NaN	2.417606	2	NaN	NaN	0.053958	0.946042
114319	228712	1	NaN	NaN	C	NaN	10.106144	NaN	NaN	0.309226	...	0.490658	BW	NaN	NaN	3.526650	0	NaN	NaN	0.220766	0.779234
114320	228713	1	1.619763	7.932978	C	4.640085	8.473141	2.351470	2.826766	3.479754	...	3.135205	V	1.943149	4.385553	1.604493	0	1.787610	1.386138	0.129088	0.870912

114321 rows × 135 columns

Feedback

Submit and view feedback for this page
Send feedback about H2O MLOps to cloud-feedback@h2o.ai

Prerequisites​

Setup​

Example Usage​

Prerequisites

Setup

Example Usage