Ingest API
Feature store ensures that data for each specific feature set does not contain duplicates. That means that only data which are unique to the feature set cache are ingested as part of the ingest operation. The rows that would lead to duplicates are skipped.
Ingest can be run on instance of feature set representing any minor version. The data are always ingested on top of latest storage stage.
Offline ingestion
To ingest data into the Feature Store, run:
Blocking approach:
- Python
- Scala
fs = project.feature_sets.get("gdp")
fs.ingest(source)
fs.ingest(source, credentials=credentials)
val fs = project.featureSets.get("gdp")
fs.ingest(source)
fs.ingest(source, credentials=credentials)
This method is not allowed for derived feature sets.
Non-Blocking approach:
- Python
- Scala
fs = project.feature_sets.get("gdp")
future = fs.ingest_async(source)
fs.ingest_async(source, credentials=credentials)
val fs = project.feature_sets.get("gdp")
val future = fs.ingestAsync(source, startDateTime="", endDateTime="")
val future = fs.ingestAsync(source, startDateTime="", endDateTime="", credentials=credentials)
This method is not allowed for derived feature sets.
More information about asynchronous methods is available at Asynchronous methods.
Parameters explanation:
- Python
- Scala
source
is the data source where Feature Store will ingest from.credentials
are credentials for the data source. If not provided, the client tries to read them from environmental variables. For more information about passing credentials as a parameter or via environmental variables, see Credentials configuration.
source
is the data source where Feature Store will ingest from.credentials
are credentials for the data source. If not provided, the client tries to read them from environmental variables. For more information about passing credentials as a parameter or via environmental variables, see Credentials configuration.
To ingest data into feature store from sources that gets changed periodically, run:
- Python
- Scala
fs = project.feature_sets.get("gdp")
new_schema = client.extract_from_source(ingest_source)
if not fs.schema.is_compatible_with(new_schema, compare_data_types=True):
patched_schema = fs.schema.patch_from(new_schema, compare_data_types=True)
new_feature_set = fs.create_new_version(schema=patched_schema, reason="schema changed before ingest")
new_feature_set.ingest(ingest_source)
else:
fs.ingest(ingest_source)
val fs = project.featureSets.get("gdp")
val newSchema = client.extractSchemaFromSource(ingestSource)
if (!fs.schema().isCompatibleWith(newSchema, compareDataTypes=true) {
val patchedSchema = fs.schema().patchFrom(newSchema, compareDataTypes=true)
val newFeatureSet = fs.createNewVersion(schema=patchedSchema, reason="schema changed before ingest")
newFeatureSet.ingest(ingestSource)
} else {
fs.ingest(ingestSource)
}
This call materializes the data and stores it in the Feature Store storage.
Feature Store does not allow specification of a feature with the same name but different case. However, during ingestion
we treat feature names case-insensitive. For example, when ingesting into feature set with single feature named
city
, the data are ingested correctly regardless of the case of the column name in the provided data source.
We correctly match and ingest into city
feature if column in the data source is named for example as CITY
, CiTy
or city
.
Online ingestion
To ingest data into the online Feature Store, run:
- Python
- Scala
feature_set.ingest_online(row/s)
featureSet.ingestOnline(row/s)
The row/s
is either a single JSON row or an array of JSON strings used
to ingest into the online.
Feature set must have a primary key defined in order to ingest and retrieve from the online Feature Store.
This method is not allowed for derived feature sets.
Lazy ingestion
Lazy ingestion is a method which when be used to migrate feature sets from different systems without the need of ingesting feature sets immediately. In lazy ingestion, the ingestion process starts the first time data from feature sets are retrieved.
Corresponding scheduled task is created when lazy ingest is defined. For more information, please see Obtaining a lazy ingest task. Only one lazy ingest task can be defined per feature set major version.
When you ingest feature sets directly using feature_set.ingest(source)
, the ingest task associated with the feature set will be deleted from the feature store.
To ingest data lazy, run:
- Python
- Scala
fs.ingest_lazy(source)
fs.ingest(source, credentials=credentials)
fs.ingestLazy(source)
fs.ingest(source, credentials=credentials)
This method will run ingest on feature set retrieve. See Feature set schedule API for information on how to delete or edit scheduled task.
- Submit and view feedback for this page
- Send feedback about H2O Feature Store to cloud-feedback@h2o.ai