Feature view API

Creating a feature view

To create a feature view, you need to build a query. You build a query by selecting features from feature sets, joining feature sets together, and by applying filters. You can also apply specific transformations through a feature view query. These transformations are supported:

min_max_scaler
standard_scaler
robust_scaler
string_indexer

note

During join transformations, Feature Store performs point in time inner or left joins

To create query with inner join execute:

Python
Scala

from featurestore.core.entities.query import Query

min_max = client.transformation_functions.get("min_max_scaler")

query = Query.select([feature_set1.features["UserId"], feature_set1.features["Label"], min_max.apply(feature_set2.features["X"])]) \
        .from_feature_set(feature_set1, "alias1") \
        .join(feature_set2, "alias2").on(feature_set1.features["UserId"], feature_set2.features["UserId"]) \
        .end()

import ai.h2o.featurestore.core.entities.Query

val minMax = client.transformationFunctions.get("min_max_scaler")

val query = Query.select([featureSet1.features["UserId"], featureSet1.features["Label"], minMax(featureSet2.features["X"])])
         .from(featureSet1, "alias1")
         .join(featureSet2, "alias2").on(featureSet1.features["UserId"], featureSet2.features["UserId"])
         .end()

To create query with left join execute:

Python
Scala

from featurestore.core.entities.query import Query

min_max = client.transformation_functions.get("min_max_scaler")

query = Query.select([feature_set1.features["UserId"], feature_set1.features["Label"], min_max.apply(feature_set2.features["X"])]) \
        .from_feature_set(feature_set1, "alias1") \
        .left_join(feature_set2, "alias2").on(feature_set1.features["UserId"], feature_set2.features["UserId"]) \
        .end()

import ai.h2o.featurestore.core.entities.Query

val minMax = client.transformationFunctions.get("min_max_scaler")

val query = Query.select([featureSet1.features["UserId"], featureSet1.features["Label"], minMax(featureSet2.features["X"])])
         .from(featureSet1, "alias1")
         .leftJoin(featureSet2, "alias2").on(featureSet1.features["UserId"], featureSet2.features["UserId"])
         .end()

To create feature view execute:

Python
Scala

feature_view = project.feature_views.create(name = "test", description="", query)

val featureView = project.featureViews.create(name = "test", description="", query)

Listing feature views within a project

Python
Scala

project.feature_views.list()

project.featureViews.list()

Obtaining a feature view

Python
Scala

feature_view = project.feature_views.get("feature_view_name", version=None)

val featureView = project.featureViews.get("feature_view_name")

val fs = project.featureViews.get("feature_set_name", 1)

If the version is not specified, the latest version of the feature view is returned.

Deleting feature views

Python
Scala

fv = project.feature_views.get("name")
fv.delete()

val fv = project.featureViews.get("name")
fv.delete()

Updating feature view fields

To update the field, simply call the setter of that field:

Python
Scala

fv = project.feature_views.get("name")
fv.description = "description"

val fv = project.featureViews.get("name")
fv.description = "description"

Creating a new feature view version

The query for a feature view cannot be updated directly. To change the query, you need to create a new version of the feature view with the updated query.

To create a new version of the feature view, you can use the create_new_version method of the feature view object and pass the updated query as a parameter to the method. The query retrieves the data from the data source and updates the feature view with the new data.

Python
Scala

fv = project.feature_views.get("name")
query = Query.select([fs_1.features["abc"], fs_1.features["xyz"]]).from_feature_set(fs_1,"alias1").join(fs_2,"alias2").on(fs_1.features["pqr"], fs_2.features["mno"]).end() # Define the query to update the feature view
fv.create_new_version(query)

val fv = project.featureViews.get("name")
query = Query.select([fs_1.features["abc"], fs_1.features["xyz"]]).from_feature_set(fs_1,"alias1").join(fs_2,"alias2").on(fs_1.features["pqr"], fs_2.features["mno"]).end() // Define the query to update the feature view
fv.createNewVersion(query)

Obtaining data as a Spark Frame

You can read the data directly as a Spark Frame:

Python
Scala

data_frame = my_feature_view.as_spark_frame(spark_session, start_at=None, end_at=None)

val dataFrame = myFeatureView.asSparkFrame(sparkSession, startAt=None, endAt=None)

Downloading the files from Feature Store

You can download the data to your local machine by:

Python
Scala

dir = my_feature_view.download(start_at=None, end_at=None)

val dir = myFeatureView.download(startAt=None, endAt=None)

Parameters Explanation:

Python
Scala

Creating a machine learning dataset

Creating a machine learning (ML) dataset allows you to materialize a feature view into the Feature Store. To create a machine learning dataset in a Feature Store, you can call the create method of the ml_datasets object of the Feature Store. You need to provide a name for the ML dataset, and if required, you can also specify the time period for which you want to include data in your ML dataset.

Python
Scala

ml_dataset = my_feature_view.ml_datasets.create("name", start_date_time=None, end_date_time=None)

mlDataSet = myFeatureView.mlDatasets.create("name", startDateTime=None, endDateTime=None)

Parameters Explanation:

Python
Scala

If start_date_time and end_date_time are empty, all ingested data are fetched. Otherwise, these parameters are used to retrieve only a specific range of ingested data. For example, when ingested data are in a time range between T1 <= T2, start_date_time can have any value T3 and end_date_time can have any value T4, where T1 <= T3 <= T4 <= T2.

If startDateTime and endDateTime are empty, all ingested data are fetched. Otherwise, these parameters are used to retrieve only a specific range of ingested data. For example, when ingested data are in a time range between T1 <= T2, startDateTime can have any value T3 and endDateTime can have any value T4, where T1 <= T3 <= T4 <= T2.

Obtaining data as a Spark Frame from the ML dataset

Python
Scala

ml_dataset = my_feature_view.ml_datasets.get("name")
data_frame = ml_dataset.as_spark_frame(sparkSession)

mlDataset = myFeatureView.mlDatasets.get("name")
dataFrame = mlDataset.asSparkFrame(sparkSession)

Downloading the files from Feature Store from the ML dataset

You can download the data to your local machine by:

Python
Scala

ml_dataset = my_feature_view.ml_datasets.get("name")
dir = ml_dataset.download()

mlDataset = myFeatureView.mlDatasets.get("name")
dir = mlDataset.download()

Retrieving data from online feature store

Once the ML dataset is created and the job finished, you can retrieve the latest feature value from the online store. To retrieve these feature values, you have to provide all primary keys to the feature sets. All transformations defined in the query will be applied during this retrieval by a pipeline created during the creation of the ML dataset.

Python
Scala

ml_dataset = my_feature_view.ml_datasets.get("name")
ml_dataset.retrieve_online(1)

mlDataset = myFeatureView.mlDatasets.get("name")
mlDataset.retrieveOnline(1)

Feature view and ML dataset permissions

The permission model of the project and feature sets is inherited by feature views and ML datasets that are created within that project and feature set.

In other words, any permissions that apply to a project and feature set, also apply to feature views and ML datasets created within a particular project and feature sets. For more information, see Permissions.

Feedback

Submit and view feedback for this page
Send feedback about H2O Feature Store to cloud-feedback@h2o.ai

Creating a feature view​

Listing feature views within a project​

Obtaining a feature view​

Deleting feature views​

Updating feature view fields​

Creating a new feature view version​

Obtaining data as a Spark Frame​

Downloading the files from Feature Store​

Creating a machine learning dataset​

Obtaining data as a Spark Frame from the ML dataset​

Downloading the files from Feature Store from the ML dataset​

Retrieving data from online feature store​

Feature view and ML dataset permissions​