Release notes
Version 1.2.0 (25-01-2024)
Fixes
- Preview now correctly show entries per selected feature set version
- Entry on
My Access
tab on a feature set did not show option to request permission in case user does not have any permission - Fix beamer location bug in UI
- Fix error swallowing on
Create project
andCreate feature set
pages - Expose missing preview configuration in Helm values
- Fix issue leading to wrong schema in case feature type was modified in schema on Python CLI
- Redis connection in online store is now correctly refreshed
- Fix logging configuration for spark driver and executor pods
- When maximum session length is reached, logout and redirect user to login interface
- Cleanup orphaned records on Redis database
- Switch installer test image to be Alpine based to reduce vulnerabilities
- Fix several online store memory leakage issues
- Fix issue with job error not being propagated in case Linkerd was enabled
- User should not be able to select day before today when creating personal access tokens
New features
- New method used to print schema in SQL format in CLIs created
- GPTe integration. When project or feature set is created or updated, information are send as a collection/document to GPTe
- Expose button in UI to manually trigger online to offline sync
- Introduce H2O Drive as a data source
- Ability to use H2O Drive and GPTe integration even when use is authenticated using personal access token
- Ability to use GCP as offline and supporting storage
- Ability to remove major feature set version
- Automatically roll helm deployments in case config map or secrets change
- Show pending permissions on Dashboard in UI
- Publish Conda packages for Python CLI
Version 1.1.2 (30-11-2023)
Fixes
- Fix regression which prevented using private certificate authorities
- Fix
CVE-2022-1471
by upgrading to Spring 3.2 and Spark 3.5
Version 1.1.1 (15-11-2023)
Fixes
- Fix several issues when Snowflake is used as storage backend
- Improve parallelism of processing message in online store
- Fix installer tests
Version 1.1.0 (09-11-2023)
Fixes
- Skip leading and trailing whitespaces in column names when parsing CSV
- Fix error when Azure Gen2 was used as supporting storage and S3 as offline storage
- Fix storage backend naming on helm level
- Fix wrong paths in several endpoints used in the UI
- Fix issue that some feature set is created twice whilst requesting higher permissions
- Update lazy ingest documentation and explain the motivation better
- Improve classifier and recommendation api documentation
- User is now able to ingest in the UI without selecting the cloud provider
- Fix issue leading to no owner being displayed in the UI on feature set tab
- Fix bug leading to
Unexpected end of JSON input
when inspecting preview of feature set in the UI - Fix all fixable CVEs to the date of the release
New features
- Ability to use Snowflake as the offline feature set storage
- Introduce access tab on the UI where owner can see all users with access to the feature set or project
- Add support for requesting higher permission in UI if user already has lower permissions
- Show list of derived and parent feature sets on feature set page in the UI
- Ability to use AWS SSO credentials for S3 data sources
- Ability to revert ingest in the UI
- Mark feature set as derived if it is derived in the UI
- Introduce method and API to perform Z-ordering on a feature set
- Feature Store deployment no longer requires any cluster roles
- Add support for Azure MSI authentication for offline store
- Feature Store now respects the limitations specified by the tier
- Add ability to withdraw pending permission request
- If user requested more than one permission and the higher one is approved, the lower one are also automatically approved
- Ability to ingest and retrieve from online feature store in UI
- Ability to use key-pair authentication for Snowflake offline store and Snowflake data source
- Expose method in CLI on feature set level to get specific version of that feature set
- Add ability to register derived feature sets in the UI
- Run database migration script as part of init script instead of directly in the pods. This way database migrations are not affected by k8s startup probes and can finish successfully even if they take longer time.
Version 1.0.0 (27-09-2023)
New features
CLI
- Ability to get latest minor version for specific major version of feature set in
- Ability to see all feature set pending and manageable reviews in scope of a specific project
- Expose method to open website with specific feature set or project from client
- Ability to change time travel column during creating new major version
- Ability to use key-pair authentication for Snowflake data source
- Ability to extract schema and ingest data source from http/https locations
- Secured connection is not used by default
UI
- Feature Store version is showed on the UI
- Show number of pending or manageable reviews next to the title in the left bar
- Ability to list and see versions of feature sets
- Ability to create new major version of feature set
- Ability to schedule data ingests
- Ability to download list of features in CSV format
- Share link to Feature Store documentation in UI left bar
- Ability to extract schema and ingest data source from http/https locations
- User is redirected to feature set in case clicking on
View
button on feature set in review in statesCreated
andApproved
Backend and others
- Ability to restrict access to Feature Store based on presents of specific JWT roles
- Improve performance of online Feature Store
- Ability to read from public S3 buckets even though credentials(correct or invalid) are provided
- Introduce API to display parent and child derived feature sets
- Introduce ability to test basic functionalities of feature Store directly using
helm test
- Alpha support for using Snowflake as storage for offline Feature Sets
- Simplify changing logging configuration for whole Feature Store stack
- Spark jobs no longer requires cluster-role
- Expand API for getting user permissions to return permissions only for specific resource
- Document better time to live and meaning of fields marking feature sets as sensitive
Fixes
- Resolve all fixable vulnerabilities to the date of the release
- Fix bug in Spark Operator caused by parallel updates of spark resources
- During schema creation in UI, data type and feature type enum was not populated in certain browsers
- Fix UI bug that user was able to request same permission more than once
- Ability to use both protocols for
s3
ands3a
when working with data sources in S3 - About item was not visible in the UI
- Other users were able to see not approved feature sets before ingestions, this is now fixed. Only owner cam see feature sets prior their approval
- Select all files to download by default when retrieving from UI
- Fix but that user was able to schedule ingestions on feature sets without access to that feature set
- Do not show button to delete artifact in UI if user does not have editor permissions
- Scope is optional field when retrieving from UI now
- Fix bug when UI was not showing the error in case job failed
- Use fonts in UI from ui-kit to allow air-gaped UI usage
- Expiration for working with feature sets using user spark sessions was hard-coded to 1 hour. Now it is configurable
- Fix bug when UI was not respecting links with specific paths
- Fix bug in UI where page was not persisted after refreshing
- Internal column (ingest_id) was leaking to retrieved files which is now fixed
- In the retrieve example notebook, the dependencies are now pointing to their maven locations
- Allow unlimited expiration for personal access tokens
- Fix bug that UI did not refresh after updating feature or feature set
- Fix bug in UI caused when opening list of versions or list of ingestions on not yet reviewed feature set
- Toggle
Use Time Travel Column as Partition
was incorrectly places in UI
This version introduced breaking changes and is not compatible with older CLIs.
Version 0.19.3 (21-08-2023)
Fixes
- After the helm changes in 0.19.2, the IAM connection was not recreated correctly which is now fixed
Version 0.19.2 (17-08-2023)
New features
- Ability to pass affinity specification to Feature Store pods via Helm
- Ability to obtain JDBC connection string for core PostgreSQL database from existing secret
- Ability for namespace override in Helm
- Add
telemetry.cloud.h2o.ai/include: true
annotation to Spark driver and executors - Add ownership attribution labels/annotations to feature store resources
Fixes
- All fixable vulnerabilities at the time of the release have been addressed
- Fix incorrect mapping of feature set flow field in Python CLI from its internal representation
Version 0.19.1 (24-07-2023)
Fixes
- Read OAuth token from correct field after upgrade to latest Fabric8 Kubernetes library
- Fix issue with removing artifacts when using Azure as storage backend
Version 0.19.0 (20-07-2023)
Fixes
- All fixable vulnerabilities at the time of the release have been addressed
- Better handling of feature containing dot in their name
- Fix bug where record was never stored to online store in case Postgresql was used as backend
- Fix several UX issues when displaying UI on small screens
- Fix non-deterministic output of versionChange flag on feature set and feature entities during updates
- Fix auth problems when using folder data sources
- Fix issue when user could not create personal access token with same name different user used
- Fix navigation bar to show all available cloud components
- Fix handling public data sources in UI
- Fix issue where files on Gen2 azure store were not accessible using SAS token
- Improve error message handling for out of memory issues
- Prevent generating pre-signed urls to Spark temporary files
- Fix issue with displaying job id in UI which contained the
x
character - Fix issue where
Canceled
stated wasn't properly displayed in jobs list on UI - Fix several spelling issues in the UI
- Add missing time travel column to the feature set page on the UI
- Fixing issue where backend tried to delete project first before deleting the feature sets inside the project
- Fix issue with ingest history not displaying correctly in UI for derived feature sets
- Ensure consistency between data in the storage and the information in the database
- Ensure documentation for log configuration is up-to-date
- Fix problem where spark properties passed as extra spark options to operator contained space characters
New features
- Implement Notifications in the UI
- Ability to create, list and revoke personal access tokens in the UI
- Ability to download pre-generated retrieve notebook via CLI and UI
- Implement review process in the UI
- Ability to ingest and retrieve from UI
- Expose ingest history in the UI
- Ability for Feature Store administrator to specify maximum duration of a personal access token
- Ability to filter jobs based on their types in the UI
- Use stable API for HPA in Feature Store Helm charts
- Introduce expiration date on a feature set drafts
Version 0.18.1 (14-06-2023)
Fixes
- Fix telemetry error causing pod restart after successfully sent message
- Fix failure when user credentials already exists during a job
- Share more logs in case sending message to telemetry service is not successful
- Fix job scheduling in case of multiple parallel ingest jobs
- Fix migration related to uploaded artifacts
Version 0.18.0 (01-06-2023)
Fixes
- Fix scheduling of ingest and revert jobs in case there is more then 1 job on the queue
- Fix bug leading to error during extract schema in UI
- Change spark app status to cancelled directly when there is no pod for that job
- Use string instead of UUID for project history
- Fix SQL constraint violation when deleting job related to feature set draft
- Strip extra spaces in URL in Python and Scala CLI
- Fix position of search bar in UI on feature set pages
- Housekeeping of uploaded artifacts
New features
- Ability to List jobs on UI
- Ability to see progress of jobs on UI
- Expose updated by field on project and feature set CLI entities and APIs
- Expose number of retrievals on popular feature sets in UI
Version 0.17.0 (25-05-2023)
Fixes
- Improved health check for Redis
- Several improved validations to register feature set UI flow
- Handle case where spark driver is deleted by something else then operator
- Fix feature set permission promotion when higher or equal project permission is created
- Fix issue with jobs failing due to having large inputs
- Generate
GetFeatureSet
even when obtaining a listable feature set - Fix issue with UI global search being extremely slow on high number of feature sets
- Fix dashboard computation being slow when high number of feature sets exists
- Fix feature view deletion bug
- Fix issue with incorrect pooling of PostgreSQL connections in online store
- Fix issue where incremental statistics were not computed for features containing dot in their names
- Fix trace id propagation on internal exception
- Do not compute Spark telemetry details on a closed Spark session
- Prevent storing internal columns into the feature set preview
- Fix SQL constraint violation during deleting derived feature sets
- Fix SQL constraint violation when deleting parent job
New features
- Azure Gen2 Jar is now published to maven central
- Introduce feature set flow configuration - user can configure synchronization between online and offline stores
- Implement recently visited projects and feature sets
- Implement popular feature sets
- Integrate with H2O AI Cloud logging service
- Introduce PostgreSQL and remove Mongo as online backend database
- IAM support for Redis
- Helm charts provide more granular control whether IAM should be used or not
- Expose method in CLI to open feature store web
- Implement pinned feature sets
- Implement UI home page
- Expose ingested records count in the ingest history api
- Support for passing security context for containers
- Expose button to trigger online materialization on UI
- Allow specifying join type in derived feature sets
- Allow to select join type in feature views
- Expose filter on feature sets to be reviewed
- Expose data source, time of ingestion, scope and user who performed the ingestion on the ingest history api
Version 0.16.0 (26-04-2023)
Fixes
- Do not create a new version of a feature set or feature in case nothing has changed during an update call
- Share warning message if join hasn't joined any data during derived feature set transformation
- Improve credentials and permission sections of documentation to be more explicit
- Improve cleaning of ill k8s resources
- Implement transitive deletion of derived feature sets
- Remove left-overs from documentation regarding MongoDB
- Improve lazy ingest message to be more explicit
- Improve telemetry health-checks
- Improve Kafka health-checks
- Fix bug in Python CLI schema extraction logic regarding nested data types
- Remove transitive dependencies from Azure Gen2 dependencies jar
- Update the dependencies section of the documentation to contain valid versions
- Project in UI should not be locked and secret by default
- Fix typo in helm charts affecting notifications configuration
- Fix handling dates prior year 1900
- Fix bug in the online store in case the data type of feature is Timestamp, and that feature is also a time travel column
- Improve error handling in UI
New features
- Ability to create feature sets in UI
- Ability to order project, feature sets or features based on specific fields in UI
- Introduce API to cancel a job and improve handing of cancelled jobs
- Introduce API to download a pre-generated notebook demonstrating retrieve flow
- Introduce API to upload and download artifacts to a specific feature set
- Support for deleting of major feature set versions
- Introduce approval process in CLIs and backend
- Introduce support for LinkerD
- Expose API to mark/unmark feature as target variable
- Display number of ingested records on CLI entities and in the UI
- Introduce API for popular feature sets
- Introduce API for recent projects and recent feature sets
- Introduce configuration for dear letter in Kafka
- Improve schema representation on Python and Scala CLI
- Expose monitoring and custom data on feature schema
Version 0.15.0 (21-03-2023)
Fixes
- Throw user-friendly exception if CLIs are trying to call non-existent API
- Dashboard API returning wrong number of features
- Documentation now clearly states what type of join is used in Feature Store
- Follow Spark logic to parsing timestamps to have more generic inputs for online ingestion
- Provide stronger validation for DeltaTable data source filters
- Schedule interval is now human-readable on CLIs
- Fix redirection message in browser after login
- Fix data back-fill in case the original data had not explicit time travel column
- Feature Stores allows auth flow for users without name and e-mail now
- Fix deletion of historical feature view when feature view was deleted
- Fix deletion of jobs related to project ids
- Provide user-friendly error in case connection to API service failed from Python and Scala CLI
- Handle internal failure during online-offline sync when feature set was deleted in the meanwhile
New features
- Internal database used to store meta-data was changed from Mongo to Postgres
- Introduction of project history
- Integration with H2O AI cloud discovery service
- MongoDB collection data source introduced
- Add possibility to change partition columns when creation a new feature set version
- Expose number of ingested records on Feature Set entity in CLIs
- Introduce
Viewer
permission. See Permissions for more details. - Send notification after PAT login
- Docusaurus is used as documentation tooling
- Introduce API to pause and resume scheduled ingest task
- Scheduled ingest tasks is paused automatically if it fails subsequently based on user defined boundary
Version 0.14.4 (28-02-2023)
Fixes
- Migration fixes to ensure compatibility with Driverless AI
- Time travel column, partition columns and primary keys are case-insensitive during their specification
New features
- Lookup for features in CLI is now case-insensitive
Version 0.14.3 (28-02-2023)
Fixes
- JWT token no longer requires expiration date to ensure consistent experience in H2O AI cloud
Version 0.14.2 (27-02-2023)
Fixes
- Sensitive consumer permission is not being granted if user is regular consumer
Version 0.14.1 (20-02-2023)
Fixes
- Fix online materialization on feature sets with features containing dot in their names
Version 0.14.0 (30-01-2023)
Fixes
- Provide error if timezone is incorrect in scheduler API
- Fix "None.get" bug during subsequent update of a feature
- Fix online materialization on timestamp column with data representing date only
- Fix online retrieval where primary key is of type timestamp with data representing date only
- Replace prints by logger in python CLI
- Fix and re-introduce
disable-api.deletion
under new more generic API
New features
- Add tooltips to
secret
andlocked
in UI - Add docstrings to all Python CLI methods
- Show values of auto generated time travel column in human readable format
Please see Migration guide for changes and deprecations.
Version 0.13.0 (05-01-2023)
Fixes
- Avoid page reload every time access token expires
- Fix OOM error in core service while deleting feature sets with 1mln+ files
Version 0.12.2 (14-12-2022)
Fixes
- Disable Locked projects in the Feature Store website
New features
- Add Google Tag Manager (GTM) support into Feature Store website
- Add custom string representation for all entities used in CLI
Version 0.12.1 (06-12-2022)
New features
- Feature Store UI as integral part of Cloud design
Version 0.12.0 (25-11-2022)
Fixes
- Unable to read data from S3 folder data source with path ending with slash
- Publish Java GRPC API with Java 11 instead of Java 17
- Fix rare bug in operator caused by its restart/redeployment leading to hanging jobs
- Fix bug caused by improper handling of trailing slash in S3 data source path
- Handle expired logging session more gracefully in CLIs
- Properly handle different schema exception in case of spark data frame ingestion
New features
- Expose access control in documentation and Python & Scala clients
- Ability to create a new major version of feature set with data back-filled from older version
- Display Navigation bar in UI
- Support for custom certificate authorities in all Feature store components
Version 0.11.0 (09-11-2022)
Fixes
- Handle missing region in AWS credentials
- Retrieve correct version of feature set after lazy ingest
- Fix sample classifier documentation
- Improve documentation for statistics computation
- Start respecting consumer and sensitive consumer permission from projects on feature sets
- Wait for MLDataset materialization
- Display feature set and project owners in UI
- Allow reverting ingests only created after derived feature set creations
- Fix 404 error when clicking of feature from Search All List
- Correctly display empty statistics on feature set in UI
- Fix statistics re-computation after revert
- Fix preview to return the preview instead of printing
- Rename TrainingDataset to MLDataset
- Enforce order of parent feature sets information
- Fix permission check while getting feature from get feature endpoint
- Start respecting minor versions of feature set
New features
- Ability to specify reason during approval/rejection/revocation of permission on UI
- Ability to edit project, feature set and feature meta-data in UI
- Introduce online MLDatasets
- New endpoints for updating project, feature set and features
- Automatically detect categorical variables during statistics computation
- Add transformations functions to feature view and MLDatasets
- Expose API to get current permission of the project or feature set
- Ability to lazy ingest into a feature set
Please see Migration guide for changes and deprecations.
Version 0.10.0 (06-10-2022)
Fixes
- Fix bad computation of time travel scope
- Better message during create new version in case version already exists
- Remove default partitioning based on time travel column (the parameter time_travel_column_as_partition is still respected)
- Run ingest job on all available executors
- Fix issue when nested schema elements are not updated
- Document what formats are valid for time travel column format
- Fix running a MOJO derived feature set in case the MOJO results in same output column as is the input
- Sanitize user emails to support emails with special characters
- Return empty response in case no classifiers are defined on a feature set
- Fix problem of CLI failing in case empty AWS region is provided
- Fix converting SampleClassifiers to internal proto representation
- Fix ingest scope computation in case previous feature set time travel scope is overlapping
- Fix empty last update on fields on projects and feature sets after creation
- Preserve order of columns in joined dataframe to fix joined derived feature set random ingestion errors
New features
- Alpha release of UI
- Capability to schedule ingests
- Feature view and training dataframe capabilities
- GRPC api exposing permissions and approval process
- Re-implement feature set preview and make sure it is available immediately without running a job
- Expand notifications to more methods (see Events for more information)
- Add md5 checks to validate integrity of uploaded pipelines to Feature store
Version 0.9.0 (07-09-2022)
Fixes
- Fix ingest of data from encrypted S3 buckets
- Ensure that ingest on non-latest major version does not update latest feature sets collection
- HPA support for feature store services
- Add TLS and IAM support to telemetry kafka stream
- Fix python retrieve holder to support calling preview and download in the same retrieve instance
- Add validation for specification of recommendation percentage specification
- Preview does not respect
start_date_time
andend_date_time
New features
- Ingest API now ingest only unique rows. Please check migration guide for more details.
- Expose custom data on feature level
- Add support for compound primary key
- Search API for projects/feature sets and features for UI
Version 0.8.0 (05-08-2022)
Fixes
- Fix creation of join derived feature sets with space in name
- Transaction in job handler commit instead rollback when some exception is thrown
- Rollback transaction when error occurs during updating job output
- Use file instead of env variable for job input to handle big inputs
- Fix revert on derived feature sets created using aggregation pipelines
- Fix bug preventing ingestion using specific spark pipelines
- Raise error during registration if feature set contains invalid characters
- Fix mojo derived feature set in case column contains a dot
- Fix bug where schema parser behaves differently on CLI and backend
- Support online materialization also on static data (without explicit time travel column)
- Fix retrieval of parent feature set during derived(join) feature set ingestion
- Fix join key validation in join feature sets to be case-insensitive
- Fail extract schema job in case
_corrupt_record
is computed
New features
- Pagination on projects and feature sets
- Improve notification API to provide more details
- Telemetry implementation
- Expose Dashboard endpoints in GRPC API
- API to delete and update recommendation classifiers
Version 0.7.1 (02-08-2022)
Fixes
- Support feature sets with high number of features
- Fix patch schema method to correctly work on nested structs
Version 0.7.0 (07-07-2022)
New features
- Recommendation engine
- Multi project search
- Validating regex as part of folder data sources before run job
- Rename (deprecate) the partitionPattern field in CSVFolder/ParquetFolder/JsonFolder/OnlineSource to filterPattern
- Ingestion validation to derived feature set operations
Fixes
- Ingesting History when a major version happens
- Creating spark pipeline file in databricks environment
- Migration for historical feature set
Version 0.6.0 (15-06-2022)
New features
- Removal of deprecated derived data sources
- Timezone independent personal access tokens expiration
- GRPC API is now versioned
- Allow read the folder data sources with empty filter
Fixes
- Use projection for feature set and project during deletion to avoid obtaining full object from database
Version 0.5.0 (07-06-2022)
New features
- Introduce derived feature sets, please refer to documentation and migration guide for more information
- Introduce concept of admin to be able to manage Feature Store via admin API
- Support for Minio as source of data
Fixes
- Fix bug in statistics job quantiles computation on empty data
- Fix problem with incorrect detection of bad data in time travel column
- Disable version checks
- Fix fullyQualifiedFeatureName migration
- Fix problem with data source having spaces in their names
- Fix hanging of jobs submitted at the same time
- Fix idempotency during deleting online feature set
Version 0.4.0 (24-05-2022)
New features
- Ability to use Mongo as Online Feature Store backend
- Give possibility to define custom log4j property files to Feature Store services
- Support for IAM roles when reading data from S3 data sources
- Support for reading data from public S3 data sources
- Document usage of feature store notifications
- Hide feature set statistics for non-sensitive consumers
- Update CRD automatically during Helm release
Fixes
- Feature set scope wasn't emptied when new major version was created
- Online to offline sync fails because of the schema mismatch
- Fix problem where job finished with state 1
- Prevent executing update on already finished job
- Don't get job output from Mongo if not required
- Prevent retrying job in case schema is different
- Optimise Kafka health checks
- Do not throw error when CLI version is not provided during GRPC call
- Fix missing import in recommendation API on Python CLI
- Fix searching feature sets based on nested names
- Fix bug when operator crushes when online messaging properties are missing
- Fix bug with missing
featureClassifiers
field - Better error reporting when data in time travel column is in invalid format during ingest
Version 0.3.0 (12-05-2022)
New features
- Replace the capability of creating new version during ingest by explicit api, please see migration guide
- Add possibility to remotely debug Feature Store application
- Add project id and feature set id as spark job pod labels
- Introduce feature recommender
- Compute stddev and mean incrementally
- Expose TTL on register feature set GRPC api
Fixes
- Improve health checks for Feature Store services
- Fix error where auth pages leads to 404 error
- Fix online feature store to work with both root and separate buckets
- Correctly fail in case feature set contains features with same name (case insensitive)
- Improve online feature store idempotency
- Correctly fail in case array is being passed as primary/secondary or time travel column
- Fix unsupported BinaryType error
- Do not put user secrets to Spark config map
- Fix incremental stats assignment in the database
- Create more user friendly error in case user is not logged in Scala and Python CLI
- Queue ingest job in case there are more jobs submitted at the same time and process them one by one on the backend
Please see Migration guide for more information on breaking changes introduced in this version.
Version 0.2.0 (21-04-2022)
New features
- Introduce incremental statistics computation for specific feature statistics
- Provide timing information about specific parts of jobs on job API
- Store child job ids on job itself in CLIs and GRPC API
- Publish events from Spark operator to Kubernetes, making them
visible using
kubectl describe
- Introduce time to live configuration for entries in jobs collection
- Use JSON format for logging across all Feature Store components
- Significantly lower the size of the operator image by removing spark distribution from it
- Expose description on the schema API
- Introduce validation which prevents modification of time travel column once feature set has been created
- Feature type can now be specified on the schema API during registering feature set or creating a new feature set version.
- Expose metrics endpoint
- Add time to live to Spark application and remove the need for spark jobs cron job
- Spark operator is now resilient towards restarts
Fixes
- Fix intermittent Mongo errors by updating Mongo client library to latest version
- Disable retry for Out Of Memory errors
- Use asynchronous call in job persister
- Fix client retry in Scala client
- Fix progress reporting in Scala client
- Fix wrong bucket name error when using root bucket on AWS deployments
- Fix bug where preview only works after downloading data
Please see Migration guide for more information on breaking changes introduced in this version.
Version 0.1.3 (08-04-2022)
Fixes
- Calling update request subsequently fails when we reach version x.10
Version 0.1.2 (31-03-2022)
New features
- Send notifications about various major events in feature store to notifications topic
- Native support for nested data types on Schema API
- Expose special data information on feature level and automatically propagate to feature set level
- Support for creating a new feature set version by changing a special data information on feature
- Expose auto project owner configuration
- Expose online and custom data fields on GRPC api
- Java GRPC api is now downloadable from feature store documentation
Fixes
- Avoid duplicate unique count computation in statistics job
- Run all job output handlers in transaction to avoid bad database state in case core restarts during job handling
- Handle case where spark driver pod is killed by K8 before the container within pod is initialized
- Prevent running multiple ingest and revert jobs on the same feature set major version
- Ensure Feature Store Core can be restarted at any stage without introducing a bad state
- Fix time to live migration on historical feature sets
- Avoid multiple notifications from online to core about data ready to be ingested
- Fix Online2Offline to work with Redis cluster deployment
- Fix statistics computation
- Fix project delete by stabilizing core during restarts + by introducing migration to remove stale jobs
- Propagate error to client in case job does not exist
- Fix cases that could lead to writing feature set to historical if it already existed
- Ensure jobs on folder resources can work when root folder ends with slash
- Fix various rare database bad states during handling revert and ingest jobs
- Fix problem during fetching user id in online ingestion
- Change stats computation to true by default
- GRPC retry now correctly works on Python client
Please see Migration guide for more information on breaking changes introduced in this version.
Version 0.1.1 (17-03-2022)
New features
- Properly refresh properties on project and feature set after updating on CLI
- Expose option for specifying min and max number of Spark executors. For more information refer to deployment section of the documentation.
- Expose configuration which enables/disables notifications logging. For more information refer to deployment section of the documentation.
- Introduce Offline to Online component in online feature store, including automatic sync of offline and online stores.
- More robust project and feature set update api. See the
Breaking Changes
section bellow.
Fixes
- Fixed time to live migration to enum, it was not executed in version 0.1.0
- Mark job as pending after it has been created
- Refresh functionality now correctly loads only latest minor version for current major feature set version
- Fix validation of online ingestion -> accept only valid json strings
- Fix and test retry mechanism. Intermittent problems within jobs are now being correctly retried.
- Remove incoming request from notification message as it can contain secure information
- Store confidential data in Kubernetes secrets instead of as in regular configuration on custom resource
- Fix regression bug causing authentication failure when using Azure service principal
- Provide proper error message if job does not exist when using job api
Please see Migration guide for more information on breaking changes introduced in this version.
Version 0.1.0 (10-03-2022)
New features
- Improve Spark operator to use K8s informers instead of regular polling of resources
- Add owner reference to spark driver pod to its parent custom resource
- Implement Online Feature Store Ingestion and Retrieve
- Implement Online 2 offline feature service
- Integrate Online Feature Store with deployment templates
- Introduce automatic notifications for each observer request from API
- Add possibility to read AWS credentials from ~/.aws/credentials
- Authentication callback endpoint now properly propagates errors
Fixes
- Fixed problem when new Version GRPC API is switching to default values of properties such as marked for masking
- Validate spaces in feature names during registration
- Remove groups and roles from user collection as feature store is not using those
- Fix permission problem when project editor is not getting access to feature sets
- Preserve capitals in the project and feature set names
- Handle failed status from operator when the driver pod gets terminated abruptly
- Fix problem which could cause job with long input to fail
Version 0.0.39 (17-02-2022)
New features
- Deployment Helm charts are available for download from Feature Store documentation
Fixes
- Support Mongo 4.2 (Create collections during core startup)
- Fix preview functionality when running on specific ingest
- Fix None.get error in job output handler
- Fix problem with duplicated data ingestion when time travel column is explicitly provided
- Fix retry functionality - store only result of lastly retried job
- Fix spark frame retrieval of specific ingest
- Fix wrong ingest id column name in Scala client
Version 0.0.38 (10-02-2022)
New features
- Introduce Spark Operator -> ensures Spark Jobs subsystem is scalable and asynchronous
- Use enum for process interval field on grpc feature set registration API
- Expose custom data on grpc feature set registration api
- Support for scheduling spark executors and drivers based on matching taints
- Expose configuration to change Spark log level
- Ensure we have only the most permissive policy on the policies collection
- Introduce historical collection for policies
- Document Sparkling Water & Feature Store integration
- Support for masking primitive and nested types ( struct and arrays) and any nested and combination level
- Introduce logging in the spark jobs
Fixes
- Fix bug caused by adding permission to an user which does not exist
- Statistics computation is now correctly started when triggered by asynchronous job
- Fix missing tls messaging documentation
- Ensure the error message from spark job can always fit into the grpc header
- Avoid reading full container for meta-data in case of using folder resource
- Ingest job now generates warning in case there is a schema difference only in type/s
- Use mounted secrets in spark jobs instead of transferring those in plain text
- Handle job outputs asynchronously
- Project consumer now does not add feature set consumer permission.
- Ensure
from featurestore import *
imports all data sources - Ensure ingest history gives correct results
- Ensure that unlocked project still requires feature set consumer permissions to retrieve from feature sets
- Fix partitionBy migration
- Fix cache migrations
- Remove extra timestamp column when retrieving data as Spark
- Ensure large access token (up to 16Kb) can be consumed by feature store
- User default partitioning when user does not specify partition by argument in register feature set API.
Version 0.0.37 (19-01-2022)
New features
- Offline Feature Store helm charts are up-to-date with latest S3 & Gen2 changes
- Support for explicitly specifying credentials during schema extraction and ingest
- Improved login functionality for CLI
- Introduce support for partitioning
- Introduce support for reverting any ingest. This change also migrates revert functionality to be based on ingest ids. This also means that the reverted data are actually getting deleted now
- Use Kafka for communication between Spark job and core. Preparing the ground for Spark operator
- Expose marked for masking on feature level in CLI
- Remove ingest number from ingest history as we use ingest ids now
Fixes
- Introduce migration to remove temporary collections created during migrations
- Fix problem with credentials for retrieving and writing spark data frames on S3 and Gen2
- Fix incorrect behaviour in folder ingest capability in case feature set did not have time travel column defined
- Fix bug when registering a feature set on project currently being deleted
- Fix ingest using spark frame when cache is configured to use single root bucket
- Gen2 & S3 support as feature store cache
- Fix problem during delete - file not found
- Asynchronous ingest job now correctly starts statistics computation job
- Fix bug where we treated DecimalType as categorical instead numerical during statistics computation
Feedback
- Submit and view feedback for this page
- Send feedback about H2O Feature Store to cloud-feedback@h2o.ai