Skip to main content
Version: v0.64.0

MLflow custom Python model example

This example demonstrates how you can upload and deploy an MLflow custom Python model using the MLOps Python client. It uploads an MLflow custom Python model to MLOps and analyzes it. It then sets its metadata and parameters, and deploys it to the dev environment in MLOps.

This approach is useful for model types that take non-standard data shapes such as large vectors, tensors, or XGBoost matrices, or otherwise require some form of preprocessing prior to scoring. A good example of this, as shown in the following example, is text data, which often needs to be vectorized and preprocessed in some manner before passing it to the model for scoring.

The following is an example of creating a sentiment model using a RandomForestClassifier that accepts a vector as the model input. Therefore, the incoming data, which is likely to be a single text column, needs to be vectorized prior to actual scoring.

Before you begin
  • Install MLflow
  • Install scikit-learn
  • You will need the values for the following constants in order to successfully carry out the task. Contact your administrator to obtain deployment specific values.
ConstantValueDescription
MLOPS_API_URLUsually: https://api.mlops.my.domainDefines the URL for the MLOps Gateway component. You can verify the correct URL by navigating to the API URL in your browser. It should provide a page with a list of available routes.
TOKEN_ENDPOINT_URL
https://mlops.keycloak.domain/auth/realms/[fill-in-realm-name]/protocol/openid-connect/token
Defines the token endpoint URL of the Identity Provider. This uses Keycloak as the Identity Provider. Keycloak Realm should be provided.
REFRESH_TOKEN<your-refresh-token>Defines the user's refresh token
CLIENT_ID<your-client-id>Sets the client id for authentication. This is the client you will be using to connect to MLOps.
PROJECT_NAMEMLflow+Custom-Python Upload And Deploy ExampleDefines a project that the script will create for the MLflow model.
EXPERIMENT_NAMEcustom-mlflow-modelDefines the experiment display name.
DEPLOYMENT_ENVIRONMENTDEVDefines the target deployment environment.
REFRESH_STATUS_INTERVAL1.0Defines a refresh interval for the deployment health check.
MAX_WAIT_TIME300Defines maximum waiting time for the deployment to become healthy.

The following steps demonstrate how you can use MLOps Python client to upload and deploy an MLflow custom Python model in MLOps.

  1. Download the MLflowCustomPythonModelExample.py file.

  2. Change the values of the following constants in your MLflowCustomPythonModelExample.py file as given in the preceding data table.

    MLflowCustomPythonModelExample.py
    ### Constants
    MLOPS_API_URL = <MLOPS_API_URL>
    TOKEN_ENDPOINT_URL = <TOKEN_ENDPOINT_URL>
    REFRESH_TOKEN = <REFRESH_TOKEN>
    CLIENT_ID = <CLIENT_ID>
    PROJECT_NAME = <PROJECT_NAME>
    EXPERIMENT_NAME = <EXPERIMENT_NAME>
    DEPLOYMENT_ENVIRONMENT = <DEPLOYMENT_ENVIRONMENT>
    REFRESH_STATUS_INTERVAL = <REFRESH_STATUS_INTERVAL>
    MAX_WAIT_TIME = <MAX_WAIT_TIME>
    MLflowCustomPythonModelExample.py
    ### Constants
    MLOPS_API_URL = "https://api.mlops.my.domain"
    TOKEN_ENDPOINT_URL="https://mlops.keycloak.domain/auth/realms/[fill-in-realm-name]/protocol/openid-connect/token"
    REFRESH_TOKEN="<your-refresh-token>"
    CLIENT_ID="<your-mlops-client>"
    PROJECT_NAME = "MLflow+Custom-Python Upload And Deploy Example"
    EXPERIMENT_NAME = "custom-mlflow-model"
    DEPLOYMENT_ENVIRONMENT = "DEV"
    REFRESH_STATUS_INTERVAL = 1.0
    MAX_WAIT_TIME = 300
  3. Run the MLflowCustomPythonModelExample.py file.

    python3 MLflowCustomPythonModelExample.py
    Deployment has become healthy  
  4. Finally, navigate to MLOps and click the project name MLflow+Custom-Python Upload And Deploy Example under Projects to view the deployed model.

    MLflow custom Python model

    Note

    For more information about model deployments in MLOps, see Understand deployments in MLOps.

Example walkthrough

This section provides a walkthrough of each of the sections in the MLflowCustomPythonModelExample.py file.

  1. Include the Helper function, which waits for the deployment to be healthy.

  2. Convert the extracted metadata into storage compatible value objects.

  3. Include the custom model wrapper.

    As long as the class inherits from mlflow.pyfunc.PythonModel and implements the expected functions, it is allowed to have additional functions (ex. get_input_column).

    MLflowCustomPythonModelExample.py
    class RandomForestWithVectorizor(mlflow.pyfunc.PythonModel):

    def load_context(self, context):
    import pickle
    with open(context.artifacts["vectorizor"], "rb") as f:
    self.vectorizor = pickle.load(f)
    with open(context.artifacts["svd"], "rb") as f:
    self.svd = pickle.load(f)
    with open(context.artifacts["rf"], "rb") as f:
    self.rf = pickle.load(f)

    def predict(self, context, model_input):
    input_vec_tfidf = self.vectorizor.transform(
    self.get_input_column(model_input)
    )
    input_vec = self.svd.transform(input_vec_tfidf)
    return self.rf.predict(input_vec)

    def get_input_column(self, model_input):
    return model_input["Description"]
  4. Set up the token provider using an existing refresh token.

  5. Set up the MLOps client.

  6. Train/Fit the necessary components for the model.

    MLflowCustomPythonModelExample.py
    # TfidfVectorizer, TruncatedSVD, RandomForestClassifier
    data_url = "https://h2o-public-test-data.s3.amazonaws.com/smalldata/amazon-food-review/"
    train_data = pd.read_csv(f"{data_url}/AmazonFineFoodReviews-train-26k.csv")

    # Fit data transformers: TfidfVectorizer and TruncatedSVD
    vectorizor = text.TfidfVectorizer(stop_words="english")
    train_tfidf_vector = vectorizor.fit_transform(train_data["Description"])

    svd = decomposition.TruncatedSVD(n_components=300)
    train_vector = svd.fit_transform(train_tfidf_vector)

    # Train RandomForestClassifier that consumes a vector
    rf = ensemble.RandomForestClassifier(n_estimators=50)
    rf.fit(train_vector, train_data["PositiveReview"])
  7. Create and set the model signature.

    The model signature is created manually because the model inputs/outputs are expected to be a single column each. It is mandatory to create the model signature for the models that are going to be loadable by the server. Only ColSpec inputs and output are supported in the model signature.

    MLflowCustomPythonModelExample.py
    input_schema = mlflow.types.Schema([
    mlflow.types.ColSpec(name="Description", type=mlflow.types.DataType.string)
    ])

    output_schema = mlflow.types.Schema([
    mlflow.types.ColSpec(name="PositiveReview", type=mlflow.types.DataType.integer)
    ])

    model_signature = mlflow.models.signature.ModelSignature(
    inputs=input_schema,
    outputs=output_schema,
    )
  8. Create a project in MLOps and create an artifact in MLOps storage.

  9. Store, zip, and upload the model and necessary artifacts.

    MLflowCustomPythonModelExample.py
    model_tmp = tempfile.TemporaryDirectory()
    try:
    model_dir_path = os.path.join(model_tmp.name, "sentiment_model")
    vectorizor_path = os.path.join(model_tmp.name, "vectorizor.pkl")
    svd_path = os.path.join(model_tmp.name, "svd.pkl")
    rf_path = os.path.join(model_tmp.name, "rf.pkl")

    with open(vectorizor_path, "wb") as f:
    pickle.dump(vectorizor, f)

    with open(svd_path, "wb") as f:
    pickle.dump(svd, f)

    with open(rf_path, "wb") as f:
    pickle.dump(rf, f)

    # Create a dictionary to tell MLflow where the necessary artifacts are
    artifacts = {
    "vectorizor": vectorizor_path,
    "svd": svd_path,
    "rf": rf_path,
    }
    # Use above defined Custom Model Wrapper
    mlflow.pyfunc.save_model(
    path=model_dir_path,
    python_model=RandomForestWithVectorizor(),
    artifacts=artifacts,
    signature=model_signature
    )
    zip_path = shutil.make_archive(
    os.path.join(model_tmp.name, "artifact"), "zip", model_dir_path
    )
    with open(zip_path, mode="rb") as zipped:
    mlops_client.storage.artifact.upload_artifact(
    file=zipped, artifact_id=artifact.id
    )
    finally:
    model_tmp.cleanup()
  10. Analyze the MLflow zip file and create an experiment from it. Then link the artifact to the experiment.

  11. Fetch the available deployment environments and search for the ID of the selected deployment environment.

  12. Customize the composition of the deployment and specify the deployment as a single deployment.

  13. Finally, create the deployment and wait for the deployment to become healthy. This analyzes and sets the metadata and parameters of the model, and deploys it to the DEV environment.


Feedback