Skip to main content
Version: v2.4.3

Supported derived transformation

Transformation changes the raw data and makes it usable by a model.

Spark pipeline

Creating a feature set via Spark pipeline. The Spark pipeline generates the data from an existing feature set that you pass in as an input to the pipeline. Feature Store then uploads the Spark pipeline to the Feature Store artifacts cache and stores only the location of the pipeline in the database.

User API:

Parameters:

  • pipeline_local_location: String or Pipeline Object - you pass the local path to the pipeline or the pipeline object itself. Once the feature set is registered, this parameter contains the path to the uploaded Spark pipeline in the Feature Store artifacts storage.
import h2o_featurestore.core.transformations as t
spark_pipeline_transformation = t.SparkPipeline("...")

Driverless AI MOJO

Creating a feature set via Driverless AI MOJO. The MOJO pipeline generates the data from an existing feature set that you pass in as an input to the pipeline. Feature Store then uploads the MOJO pipeline to the Feature Store artifacts cache and stores only the location of the pipeline in the database.

note

Only features created from Driverless AI with the make_mojo_scoring_pipeline_for_features_only setting are supported in Feature Store.

User API:

Parameters:

  • mojo_local_location: String - you pass the local path to the pipeline. Once the feature set is registered, this parameter contains the path to the uploaded MOJO pipeline in the Feature Store artifacts cache
  • shapley_value_type: ShapleyValueType - the Shapley value computation to perform. Defaults to ShapleyValueType.NONE.

ShapleyValueType

The following values are available:

  • ShapleyValueType.NONE—Feature Store does not compute Shapley values. This is the default.
  • ShapleyValueType.ORIGINAL—Feature Store computes Shapley values for the original input features. Use this to understand which raw inputs drive the model's prediction.
  • ShapleyValueType.TRANSFORMED—Feature Store computes Shapley values for the transformed features. Use this to understand the model's internally engineered representation.
import h2o_featurestore.core.transformations as t
from h2o_featurestore.core.transformations import ShapleyValueType

transformation = t.DriverlessAIMOJO("...", shapley_value_type=ShapleyValueType.TRANSFORMED)

# To use original input feature contributions instead:
# transformation = t.DriverlessAIMOJO("...", shapley_value_type=ShapleyValueType.ORIGINAL)
Performance impact

When you enable Shapley values, the MOJO pipeline runs twice—once for base predictions and once for Shapley contributions. This may significantly increase the time to generate predictions during ingestion (roughly doubling in many cases).

  • The MOJO must support Shapley contributions for the requested type; otherwise, Feature Store raises an error during ingestion.
  • Feature Store appends Shapley contribution columns after the base prediction columns in the output. The Driverless AI MOJO pipeline determines column names, which vary by model. Shapley columns are included when you retrieve the feature set.
  • The default value is ShapleyValueType.NONE, so existing MOJO transformations are unaffected.

JoinFeatureSets

Creating a new feature set by joining together two different feature sets.

User API:

Parameters:

  • left_key: String - joining key which must be present in left feature set
  • right_key: String - joining key which must be present in right feature set
  • join_type: JoinFeatureSetsType - the join type. Defaults to JoinFeatureSetsType.INNER.

JoinFeatureSetsType

The following values are available:

  • JoinFeatureSetsType.INNER—The inner join is the default join in Spark SQL. It selects rows that have matching values in both relations.
  • JoinFeatureSetsType.LEFT—A left join returns all values from the left relation and the matched values from the right relation, or appends NULL if there is no match.
  • JoinFeatureSetsType.RIGHT—A right join returns all values from the right relation and the matched values from the left relation, or appends NULL if there is no match.
  • JoinFeatureSetsType.FULL—A full join returns all values from both relations, appending NULL values on the side that does not have a match.
  • JoinFeatureSetsType.CROSS—A cross join returns the Cartesian product of two relations.
import h2o_featurestore.core.transformations as t

transformation = t.JoinFeatureSets(left_key=..., right_key=..., join_type=...)
note

During join transformations, Feature Store perform inner joins


Feedback