Clusters and Notebooks

The Clusters page shows all H2O clusters that Enterprise Steam is connected to along with the state of the cluster, the cluster type, the ID of the user who started the cluster, and the cluster creation date.

Clusters page

Launching a New Cluster

The options for launching a new cluster vary based on the selected cluste Type.

Launch a New H2O Cluster

  1. In the Enterprise Steam UI, navigate to the Clusters page and select Launch New Cluster.
  2. Select H2O from the Cluster Type dropdown menu.
  3. Select a Cluster Profile from the dropdown menu to use when setting up the new cluster. Cluster profiles are configured by the Admin on the Configurations page and provide the allowed min and max values for each options in a cluster profile.
  4. Specify values for the options below. Once added, other Enterprise Steam users will be able to connect to this cluster.
  • Cluster Name: Specify a name for this cluster.

  • H2O Version: For new H2O clusters, specify the H2O version to use.

  • Number of Nodes: Specify the number of nodes for the cluster.

  • Java Memory per Node [GB]: Specify the amount of memory that should be available on each node.

  • YARN Virtual Cores per Node: Specify the number of virtual cores.

  • H2O Threads per Node: Specify the number of threads (CPUs) to use in the cluster. Leave this blank to use all available threads.

  • Extra Memory: Specify the amount of extra memory for internal JVM use outside of the Java heap. This is a percentage of memory per node. The default (and recommended) value is 10%.

  • Maximum Idle Time [HRS]: Specify the maximum number of hours that the cluster can be idle before gracefully shutting down. Leave this blank to turn off this setting and allow the cluster to remain idle for an unlimited amount of time.

  • Maximum Uptime [HRS]: Specify the maximum number of hours that the cluster can be running. Leave this blank to turn off this setting and allow the cluster to remain up for an unlimited amount of time.

  • Leader Node ID: Optionally specify whether to connect to a different leader node.

  • YARN Queue: If your cluster contains queues for allocating cluster resources, optioanlly specify a queue for this cluster. Note that the YARN Queue cannot contain spaces. Leave this empty to use the default YARN queue.

    Launch new cluster
  1. Click the Launch New Cluster button to start the new cluster.

Upon successful completion, the cluster will appear on the Clusters page.

Launch a New Sparkling Water - Internal Backend Cluster

  1. In the Enterprise Steam UI, navigate to the Clusters page and select Launch New Cluster.
  2. Select Sparkling Wate - Internal Backend from the Cluster Type dropdown menu.
  3. Select a Cluster Profile from the dropdown menu to use when setting up the new cluster. Cluster profiles are configured by the Admin on the Configurations page and provide the allowed min and max values for each options in a cluster profile.
  4. Specify values for the options below. Once added, other Enterprise Steam users will be able to connect to this cluster.
  • Cluster Name: Specify a name for this cluster.
  • Sparkling Water Vesion: For new Sparkling Water clusters, specify the Sparkling Water version to use.
  • Driver Cores: Specify the number of driver cores for the cluster.
  • Driver Memory [GB]: Specify the amount of driver memory that should be available on each core.
  • Number of Executors: Specify the number of of cores perr executors.
  • Executor Cores: Specify the number of cores per executor.
  • Executor Memory: Specify the amount of executor memory per node (in GB).
  • H2O Threads per Node: Specify the number of threads (CPUs) to use in the cluster. 0 indicates to use all available threads.
  • Startup Timeout [SEC]: Specify the startup timeout in seconds. The cluster will terminate if it cannot start within this time.
  • YARN Queue: If your cluster contains queues for allocating cluster resources, optioanlly specify a queue for this cluster. Note that the YARN Queue cannot contain spaces. Leave this empty to use the default YARN queue.
  • PySpark Python Path: If you have a custom Python environment, you can specify the path here; otherwise, leave empty to use the default Python path.
  • Spark Properties: This shows a list of additional Spark properties for the cluster. This is maintained on by Admins from the Profiles tabl.
  1. Click the Launch New Cluster button to start the new cluster.

Upon successful completion, the cluster will appear on the Clusters page.

Launch a New Sparkling Water - External Backend Cluster

  1. In the Enterprise Steam UI, navigate to the Clusters page and select Launch New Cluster.
  2. Select Sparkling Water - External Backend from the Cluster Type dropdown menu.
  3. Select a Cluster Profile from the dropdown menu to use when setting up the new cluster. Cluster profiles are configured by the Admin on the Configurations page and provide the allowed min and max values for each options in a cluster profile.
  4. Specify values for the options below. Once added, other Enterprise Steam users will be able to connect to this cluster.
  • Cluster Name: Specify a name for this cluster.
  • Sparkling Water Vesion: For new Sparkling Water clusters, specify the Sparkling Water version to use.
  • Driver Cores: Specify the number of driver cores for the cluster.
  • Driver Memory [GB]: Specify the amount of driver memory that should be available on each core.
  • Number of Executors: Specify the number of of cores perr executors.
  • Executor Cores: Specify the number of cores per executor.
  • Executor Memory: Specify the amount of executor memory per node (in GB).
  • H2O Nodes: Specify the number of H2O nodes.
  • H2O Memory Per Node [GB]: Specify the amount of memory in GB to allocate to H2O for each node.
  • H2O Threads per Node: Specify the number of threads (CPUs) to use in the cluster. 0 indicates to use all available threads.
  • Startup Timeout [SEC]: Specify the startup timeout in seconds. The cluster will terminate if it cannot start within this time.
  • YARN Queue: If your cluster contains queues for allocating cluster resources, optioanlly specify a queue for this cluster. Note that the YARN Queue cannot contain spaces. Leave this empty to use the default YARN queue.
  • PySpark Python Path: If you have a custom Python environment, you can specify the path here; otherwise, leave empty to use the default Python path.
  • Spark Properties: This shows a list of additional Spark properties for the cluster. This is maintained on by Admins from the Profiles tabl.
  1. Click the Launch New Cluster button to start the new cluster.

Upon successful completion, the cluster will appear on the Clusters page.

Deleting Clusters

Click the Delete icon beside the cluster that you want to delete. A confirmation message will display. Click Confirm to continue. This action stops and then removes the cluster.

Delete clusters

Notebooks

Enterprise Steam allows you to upload and run PySparkling and RSparking Jupyter Notebooks from within the cluster. More information about Jupyter Notebooks is available here: https://jupyter.org/.

Requirements for RSparkling Jupyter Notebooks

  • The Hadoop cluster must have R installed along with devtools and sparklyr libraries.
  • In Steam you must have an H2O engine uploaded as well as Sparkling Water engine that was built for the same H2O version. The supported minimum Sparkling Water versions for this feature include:
    • 2.1.53
    • 2.2.39
    • 2.3.28
    • 2.4.10

Launching Notebooks

In the Enterprise Steam UI, navigate to the Clusters page and click the My Notebooks button in the upper-right corner to view available notebooks.

Creating New Notebooks

On the Jupyter Notebook Files tab, click New dropdown and select the type of notebook or other file that you want to create. Then create and save the new notebook.

New Notebook

Adding Notebooks

  1. On the Jupyter Notebook Files tab, click the Upload button.
  2. Browse to the location on your local machine where your notebooks is stored.
  3. Click Upload to complete the notebook upload process.
Adding Notebooks