Version: v2.1.0

Concepts

This page explains the main concepts of Feature Store.

Projects

Projects are the repository that contain feature sets which are comprised of features. Projects can be used to separate work by department (e.g., engineering and accounting).

Project Access Modifiers

Access to projects can be further modified by access modifiers. For more information, please see Projects access modifiers.

Features

Features are columns of highly curated data. Features are used to enhance the performance of ML models because features are measurable data. Features can be seen when you call the schema, and the printout will be in the order of <column title> <feature>. For example:

category STRING, jobtitle STRING

Feature sets

A feature set is a collection of features. Feature sets are created via registration from the feature set schema. Registering a feature set simply means you are creating a new feature set. This information comes from a schema that you have extracted from a raw data source that you ingested into Feature Store.

The data sources for ingestion are available on the Supported data sources page.

Derived feature sets

Feature Store has the ability to create derived feature sets. Derived feature sets are created from a parent feature set that has applied transformations. When the parent feature set is ingested to or reverted from, it automatically triggers the ingesting and/or reverting changes for its derived feature set.

The supported ways of transformation are:

Keys

A feature in the feature set can be marked as a primary key. This primary key can be used to search for a specific item in your data. Primary keys must have a unique value (e.g., a social security number). When you want to create data from more feature sets, these are the keys used for the joining process.

Types of Feature Store users

There are several types of user permissions in Feature Store. For more information please see Permissions.

Storage

Feature Store uploads outputted data to a data store. You can obtain the data by downloading it using the pre-signed URL link.

Storage backend

Multiple storage backends are supported:

Any system exposing S3 API (AWS, Google Cloud, Minio)
Azure Data Lake Gen 2

Storage file format

Files are written in delta format.

Output data

Output data results from the materialization of the features. The data can then be used inside any ML platform.

Incremental ingest

Incremental ingestion is a consistent ingestion that takes place over time. Instead of ingesting all the data at once, it ingests new data over time (e.g., every five hours or every day). This can be done through scheduled ingestion.

Feature Store maintains one entry in storage for each major version of a feature set. New data are appended to storage during each new data ingest. Only unique values are appended.

Feedback

Submit and view feedback for this page
Send feedback about H2O Feature Store to cloud-feedback@h2o.ai

Projects​

Project Access Modifiers​

Features​

Feature sets​

Derived feature sets​

Keys​

Tags​

Types of Feature Store users​

Storage​

Storage backend​

Storage file format​

Output data​

Incremental ingest​