Supported derived transformation
Transformation changes the raw data and makes it usable by a model.
Spark pipeline
Creating a feature set via Spark pipeline. The Spark pipeline generates the data from an existing feature set that you pass in as an input to the pipeline. Feature Store then uploads the Spark pipeline to the Feature Store artifacts cache and stores only the location of the pipeline in the database.
User API:
- Python
Parameters:
pipeline_local_location: String or Pipeline Object- you pass the local path to the pipeline or the pipeline object itself. Once the feature set is registered, this parameter contains the path to the uploaded Spark pipeline in the Feature Store artifacts storage.
import h2o_featurestore.core.transformations as t
spark_pipeline_transformation = t.SparkPipeline("...")
Driverless AI MOJO
Creating a feature set via Driverless AI MOJO. The MOJO pipeline generates the data from an existing feature set that you pass in as an input to the pipeline. Feature Store then uploads the MOJO pipeline to the Feature Store artifacts cache and stores only the location of the pipeline in the database.
Only features created from Driverless AI with the
make_mojo_scoring_pipeline_for_features_only
setting
are supported in Feature Store.
User API:
- Python
Parameters:
mojo_local_location: String- you pass the local path to the pipeline. Once the feature set is registered, this parameter contains the path to the uploaded MOJO pipeline in the Feature Store artifacts cacheshapley_value_type: ShapleyValueType- the Shapley value computation to perform. Defaults toShapleyValueType.NONE.
ShapleyValueType
The following values are available:
ShapleyValueType.NONE—Feature Store does not compute Shapley values. This is the default.ShapleyValueType.ORIGINAL—Feature Store computes Shapley values for the original input features. Use this to understand which raw inputs drive the model's prediction.ShapleyValueType.TRANSFORMED—Feature Store computes Shapley values for the transformed features. Use this to understand the model's internally engineered representation.
import h2o_featurestore.core.transformations as t
from h2o_featurestore.core.transformations import ShapleyValueType
transformation = t.DriverlessAIMOJO("...", shapley_value_type=ShapleyValueType.TRANSFORMED)
# To use original input feature contributions instead:
# transformation = t.DriverlessAIMOJO("...", shapley_value_type=ShapleyValueType.ORIGINAL)
When you enable Shapley values, the MOJO pipeline runs twice—once for base predictions and once for Shapley contributions. This may significantly increase the time to generate predictions during ingestion (roughly doubling in many cases).
- The MOJO must support Shapley contributions for the requested type; otherwise, Feature Store raises an error during ingestion.
- Feature Store appends Shapley contribution columns after the base prediction columns in the output. The Driverless AI MOJO pipeline determines column names, which vary by model. Shapley columns are included when you retrieve the feature set.
- The default value is
ShapleyValueType.NONE, so existing MOJO transformations are unaffected.
JoinFeatureSets
Creating a new feature set by joining together two different feature sets.
User API:
- Python
Parameters:
left_key: String- joining key which must be present in left feature setright_key: String- joining key which must be present in right feature setjoin_type: JoinFeatureSetsType- the join type. Defaults toJoinFeatureSetsType.INNER.
JoinFeatureSetsType
The following values are available:
JoinFeatureSetsType.INNER—The inner join is the default join in Spark SQL. It selects rows that have matching values in both relations.JoinFeatureSetsType.LEFT—A left join returns all values from the left relation and the matched values from the right relation, or appends NULL if there is no match.JoinFeatureSetsType.RIGHT—A right join returns all values from the right relation and the matched values from the left relation, or appends NULL if there is no match.JoinFeatureSetsType.FULL—A full join returns all values from both relations, appending NULL values on the side that does not have a match.JoinFeatureSetsType.CROSS—A cross join returns the Cartesian product of two relations.
import h2o_featurestore.core.transformations as t
transformation = t.JoinFeatureSets(left_key=..., right_key=..., join_type=...)
During join transformations, Feature Store perform inner joins
- Submit and view feedback for this page
- Send feedback about H2O Feature Store to cloud-feedback@h2o.ai