Sparkling Water users¶
Sparkling Water is a gradle project with the following submodules:
Core: Implementation of H2OContext, H2ORDD, and all technical integration code.
Examples: Application, demos, and examples.
ML: Implementation of MLlib pipelines for H2O-3 algorithms.
Assembly: This creates “fatJar” (composed of all other modules).
py: Implementation of (H2O-3) Python binding to Sparkling Water.
The best way to get started is to modify the core module or create a new module (which extends the project).
Note
Sparkling Water is only supported with the latest version of H2O-3.
Sparkling Water is versioned according to the Spark versioning, so make sure to use the Sparkling Water version that corresponds to your installed version of spark.
Getting started with Sparking Water¶
This section contains links that will help you get started using Sparkling Water.
Download Sparkling Water¶
Navigate to the Downloads page.
Click Sparkling Water or scroll down to the Sparkling Water section.
Select the version of Spark you have to download the corresponding version of Sparkling Water.
Sparkling Water documentation¶
The documentation for Sparkling Water is separate from the H2O-3 user guide. Read this documentation to get started with Sparkling Water.
Sparkling Water tutorials¶
This section contains demos and examples showcasing Sparkling Water.
Sparkling Water K-Means tutorial: This tutorial uses Scala to create a K-Means model.
Sparkling Water GBM tutorial: This tutorial uses Scala to create a GBM model.
Sparkling Water on YARN: This tutorial walks you through how to run Sparkling Water on a YARN cluster.
Building machine learning applications with Sparkling Water: This tutorial describes project building and demonstrates the capabilities of Sparkling Water using Spark Shell to build a Deep Learning model.
Connecting RStudio to Sparkling Water: This illustrated tutorial describes how to use RStudio to connect to Sparkling Water.
Sparkling Water FAQ¶
The frequently asked questions provide answers to many common questions about Sparkling Water.
Sparkling Water blog posts¶
PySparkling¶
PySparkling can be installed by downloading and running the PySparkling shell or by using pip
. PySparkling can also be installed from the PyPI repository. Follow the instructions for how to install PySparkling on the Download page for Sparkling Water.
PySparkling documentation¶
Documentation for PySparkling is available for the following versions:
RSparkling¶
The RSparkling R package is an extension package for sparklyr that creates an R front-end for the Sparkling Water package from H2O-3. This provides an interface to H2O-3’s high performance, distributed machine learning algorithms on Spark using R.
This package implements basic functionality by creating an H2OContext, showing the H2O Flow interface, and converting between Spark DataFrames. The main purpose of this package is to provide a connector between sparklyr and H2O-3’s machine learning algorithms.
The RSparkling package uses sparklyr for Spark job deployment and initialization of Sparkling Water. After that, you can use the regular H2O R package for modeling.
RSparkling documentation¶
Documentation for RSparkling is available for the following versions: