Concepts
This page explains the main concepts of Feature Store.
Projects
Projects are the repository that contain feature sets which are comprised of features. Projects can be used to separate work by department (e.g., engineering and accounting).
Project Access Modifiers
Access to projects can be further modified by access modifiers. For more information, please see Projects access modifiers.
Features
Features are columns of highly curated data. Features are used to
enhance the performance of ML models because features are measurable
data. Features can be seen when you call the schema, and the printout will be in
the order of <column title> <feature>
. For example:
category STRING, jobtitle STRING
Feature sets
A feature set is a collection of features. Feature sets are created via registration from the feature set schema. Registering a feature set simply means you are creating a new feature set. This information comes from a schema that you have extracted from a raw data source that you ingested into Feature Store.
The data sources for ingestion are available on the Supported data sources page.
Derived feature sets
Feature Store has the ability to create derived feature sets. Derived feature sets are created from a parent feature set that has applied transformations. When the parent feature set is ingested to or reverted from, it automatically triggers the ingesting and/or reverting changes for its derived feature set.
The supported ways of transformation are:
Feature views
Feature view allows you to retrieve features from different feature sets within a Project. You can select relevant features by joining two or more feature sets with applied filters. This creates an ML dataset (also called a training dataset).
By creating the ML dataset, you materialize feature view into your storage with a given start and end time.
Keys
A feature in the feature set can be marked as a primary key. This primary key can be used to search for a specific item in your data. Primary keys must have a unique value (e.g., a social security number). When you want to create data from more feature sets, these are the keys used for the joining process.
Tags
Tags can be attached to feature sets for filtering purposes.
Types of Feature Store users
There are several types of user permissions in Feature Store. For more information please see Permissions.
Storage
Feature Store uploads outputted data to a data store. You can obtain the data by downloading it using the pre-signed URL link.
Storage backend
Multiple storage backends are supported:
- Any system exposing S3 API (AWS, Google Cloud, Minio)
- Azure Data Lake Gen 2
Storage file format
Files are written in delta format.
Output data
Output data results from the materialization of the features. The data can then be used inside any ML platform.
Incremental ingest
Incremental ingestion is a consistent ingestion that takes place over time. Instead of ingesting all the data at once, it ingests new data over time (e.g., every five hours or every day). This can be done through scheduled ingestion.
Feature Store maintains one entry in storage for each major version of a feature set. New data are appended to storage during each new data ingest. Only unique values are appended.
- Submit and view feedback for this page
- Send feedback about H2O Feature Store to cloud-feedback@h2o.ai