Spark dependencies
If you want to interact with Feature Store from a Spark session, several dependencies need to be added on the Spark Classpath. Supported Spark versions are 3.2.x.
Using S3 as the Feature Store storage:
io.delta:delta-core_2.12:2.4.0
org.apache.hadoop:hadoop-aws:${HADOOP_VERSION}
note
HADOOP_VERSION
is the hadoop version your Spark is built for.
Version of delta-core library needs to match your Spark version.
Version 2.4.0
can be used by Spark 3.4.
Using Azure Gen2 as the Feature Store storage:
io.delta:delta-core_2.12:2.4.0
featurestore-azure-gen2-spark-dependencies.jar
org.apache.hadoop:hadoop-azure:${HADOOP_VERSION}
note
HADOOP_VERSION
is the hadoop version your Spark is built for.
Version of delta-core library needs to match your Spark version.
Version 2.4.0
can be used by Spark 3.4.
The Spark dependencies jar can be downloaded from the Downloads page.
General configuration
Spark needs to be started with the following configuration to ensure that the time travel queries are correct:
spark.sql.session.timeZone=UTC
spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension
spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
If you do not have Apache Spark started, please start it first.
Feedback
- Submit and view feedback for this page
- Send feedback about H2O Feature Store to cloud-feedback@h2o.ai