Skip to main content
Version: 1.2.0

Spark dependencies

If you want to interact with Feature Store from a Spark session, several dependencies need to be added on the Spark Classpath. Supported Spark versions are 3.5.x.

Using S3 as the Feature Store storage:

  • io.delta:delta-spark_2.12:3.0.0
  • org.apache.hadoop:hadoop-aws:${HADOOP_VERSION}
note

HADOOP_VERSION is the hadoop version your Spark is built for.

Version of delta-spark library needs to match your Spark version. Version 3.0.0 can be used by Spark 3.5.

Using Azure Gen2 as the Feature Store storage:

  • io.delta:delta-spark_2.12:3.0.0
  • featurestore-spark-dependencies.jar
  • org.apache.hadoop:hadoop-azure:${HADOOP_VERSION}
note

HADOOP_VERSION is the hadoop version your Spark is built for.

Version of delta-spark library needs to match your Spark version. Version 3.0.0 can be used by Spark 3.5.

The Spark dependencies jar can be downloaded from the Downloads page.

Using Snowflake as the Feature Store storage:

  • net.snowflake:spark-snowflake_${SCALA_VERSION}:2.12.0-spark_3.4
note

SCALA_VERSION is the scala version used.

Version of spark-snowflake library needs to match your Spark version. Version 2.12.0-spark_3.4 can be used by Spark 3.4.

General configuration

Spark needs to be started with the following configuration to ensure that the time travel queries are correct:

  • spark.sql.session.timeZone=UTC
  • spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension
  • spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog

In case of running Databricks 11.3 and higher, following options need to be set as well:

  • databricks.loki.fileSystemCache.enabled=false

If you do not have Apache Spark started, please start it first.


Feedback