.. _rsparkling_azure: Running RSparkling on Databricks Azure Cluster ---------------------------------------------- Sparkling Water, PySparkling and RSparkling can be used on top of Databricks Azure Cluster. This tutorial is the **RSparkling**. For Scala Sparkling Water, please visit :ref:`sw_azure` and for PySparkling, please visit :ref:`pysparkling_azure`. To start Sparkling Water ``H2OContext`` on Databricks Azure, the steps are: 1. Login into Microsoft Azure Portal 2. Create Databricks Azure Environment In order to connect to Databricks from Azure, please make sure you have created the user inside Azure Active Directory and using that user for the Databricks Login. 3. Create the cluster - For Sparkling Water SUBST_SW_VERSION select Spark SUBST_SPARK_VERSION It is advised to always use the latest Sparkling Water and Spark version for the given Spark major version. .. figure:: ../images/databricks_cluster_creation.png :alt: Configured cluster ready to be started 4. Create a R notebook and attach it to the created cluster. To start ``H2OContext``, the init part of the notebook should be: .. code:: R # Install Sparklyr install.packages("sparklyr") # Install H2O SUBST_H2O_VERSION (SUBST_H2O_RELEASE_NAME) install.packages("h2o", type = "source", repos = "http://h2o-release.s3.amazonaws.com/h2o/rel-SUBST_H2O_RELEASE_NAME/SUBST_H2O_BUILD_NUMBER/R") # Install RSparkling SUBST_SW_VERSION install.packages("rsparkling", type = "source", repos = "http://h2o-release.s3.amazonaws.com/sparkling-water/spark-SUBST_SPARK_MAJOR_VERSION/SUBST_SW_VERSION/R") # Connect to Spark on Databricks library(rsparkling) library(sparklyr) sc <- spark_connect(method = "databricks") # Start H2O context h2o_context(sc) 6. And voila, we should have ``H2OContext`` running .. figure:: ../images/databricks_rsparkling_h2o_context_running.png :alt: Running H2O Context 7. Flow is accessible via the URL printed out after H2OContext is started. Internally we use open port 9009. If you have an environment where a different port is open on your Azure Databricks cluster, you can configure it via ``spark.ext.h2o.client.web.port``.