Products

H2O on Hadoop

The H2O section allows you to enable Enterprise Steam to work with H2O. From this section, you can also add new H2O engines and specify launch parameters to be used when starting H2O.

Products section

External XGBoost cluster

When using H2O version 3.30.1.1 or higher Enterprise Steam will launch on-demand XGBoost cluster if it is requested by the original H2O cluster. Since XGBoost normally requires the H2O cluster to have extra memory this addition saves compute resources. XGBoost cluster is brought up and shut down on demand.

Configuration

H2O Configuration allows you to enable Enterprise Steam to work with H2O. When enabled, the default-h2o profiles will be available. (See the Profiles section for more information.)

  1. Click the Configuration option under the H2O section.

  2. Enable H2O and select the Hadoop backend.

  3. (Optional) Turn on Internal Secure Connections. This enables SSL internal security.

  4. (Optional) Turn on External XGBoost cluster support as described in the previous section.

  5. (Optional) Allow insecure XGBoost on clusters with Internal Secure Connections enabled.

  6. (Optional) Specify any additional Hadoop CLASSPATH options. These will be added to HADOOP_CLASSPATH when launching H2O clusters.

  7. (Optional) Specify a custom prefix that will appear in front of the jobname in YARN resource manager.

  8. (Optional) Override default H2O driver output directory - this will enable following configuration

  9. (Optional) Specify custom H2O driver output directory. You can use ‘{username}’ as a template that will be replaced with the name of the user launching cluster. Example: ‘s3a://home/{username}/’ expands to ‘s3a://home/john/’.

  10. Click Save Configuration when you are done.

H2O configuration

Engines

Click the Engines option in the H2O section to view a list of available H2O engines. To install a new H2O engine, download the engine that matches your desired H2O version and Hadoop distribution. Files can be uploaded by clicking the Browse button or imported from the server by providing the absolute path to the engine file and clicking the Import button. You can also delete engines that are no longer needed by clicking on the Remove engine icon next to the engine you want to delete.

Note: The minimum supported version of H2O-3 in Enterprise Steam is 3.10.0.7

H2O engines section

Startup Parameters

The Startup Parameters option in the H2O section allows you to specify startup parameters to be used when starting H2O. A list of parameters that can be specified is available in the H2O-3 User Guide.

  1. Click Add Parameter to add a new startup parameter.

  2. Enter the name of the paramter.

  3. Specify a value for the parameter.

  4. Specify if this paramter is the default or an override.

  5. Specify if this is a Hadoop or H2O Driver parameter.

  6. Click Create when you are done.

H2O startup parameters

H2O on Kubernetes

External XGBoost cluster

When using H2O version 3.30.1.1 or higher Enterprise Steam will launch on-demand XGBoost cluster if it is requested by the original H2O cluster. Since XGBoost normally requires the H2O cluster to have extra memory this addition saves compute resources. XGBoost cluster is brought up and shut down on demand.

Following sections provide information on how to configure H2O deployment on Kubernetes backend.

Configuration

  1. Click the Configuration option under the H2O section.

  2. Enable H2O and select the Kubernetes backend.

  3. Click Save Configuration when you are done.

Engines

Click the Engines option in the H2O section to view a list of available H2O engines. You can delete engines that are no longer needed by clicking on the Remove engine icon next to the engine you want to delete.

Note: The minimum supported version of H2O-3 in Enterprise Steam is 3.30.0.5

To add a new H2O kubernetes engine, click on Add engine button in the top right corner.

  1. Specify the H2O version of your Docker image.

  2. Copy full image name from Docker Hub or your private repository.

  3. (Optional) Set the image pull policy.

  4. (Optional) Set the image pull secret for your private repository.

Sparkling Water

The Sparkling Water section allows you to enable Enterprise Steam integration with Sparkling Water. From this section, you can also upload new Sparkling Water engines and new Python environments.

Configuration

Sparkling Water Configuration allows you to enable Sparkling Water and/or RSparkling. When enabled, the default-sparkling-internal and default-sparkling-external profiles will be available. (See the Profiles section for more information.)

Note: Sparkling Water Jupyter Notebooks are saved to disk at: /opt/h2oai/steam/data/jupyterhub/home/{username}

Note: When using SudoSpawner, Notebooks are saved to disk at: /home/{username}/{Notebook Path}

Note: RSparkling requires all Hadoop worker nodes to have R installed. All worker nodes also need to have devtools and sparklyr==1.9.4 package installed.

  1. Click the Configuration option under Sparkling Water.

  2. Enable Sparkling Water and/or R Sparkling. See requirements is notes above.

  3. Specify the path to your Spark home directory (SPARK_HOME).

  4. (Optional) Override default HADOOP_CONF_DIR to user defined HADOOP_CONF_DIR (from Hadoop configuration).

  5. Specify the path to your Java 8 home directory (JAVA_HOME).

  6. (Optional) Add a comma-separated list of JAR paths that will be added to the classpath

  7. (Optional) Turn on Internal Secure Connections. This enables SSL internal security.

  8. (Optional) Allow insecure XGBoost on clusters with Internal Secure Connections enabled.

  9. Select a Sparkling Water backend to enforce. You can change this to Internal or External backend.

  10. (Optional) Change prefix to the cluster name that appears in YARN.

  11. (Optional) Enable use of sudospawner for jupyterhub (requires sudoers setup, detailed in Knowledge Base)

  12. (Optional) Set the notebook directory for sudospawner, relative to the user’s home directory

  13. Click Save Configuration when you are done.

Sparkling Water configuration

Engines

Click the Engines option to view a list of available Sparkling Water engines. To install a new Sparkling Water engine, download the engine that matches your version of Spark. Files can be uploaded by clicking the Browse button or imported from the server by providing the absolute path to the engine file and clicking the Import button. You can also delete engines that are no longer needed by clicking on the Remove engine icon next to the engine you want to delete.

Note: The minimum supported versions of Sparkling Water in Enterprise Steam are:

  • 2.1.41

  • 2.2.27

  • 2.3.16

  • 2.4.1

Enterprise Steam 1.6.6 added support for Sparkling Water 3.28.1.1 and higher.

SW engines section

Python Environments

Click the Python Environments option view a list of available Python environments. Enterprise Steam ships with two default environments - Python 2.7 and Python 3.7. These environments allow Enterprise Steam to utilize PySparkling.

Python Environments page

Add New Python Environment

With Enterprise Steam, you can add a new Python environment from a Python path or from a Conda package.

Add New Conda Pack

Perform the following steps to add a new Python environment from a Conda package.

Note: The OS type where the environment was built must match the OS type of the target machine.

  1. On the Python Environments page, click Add Environment.

  2. Select Conda pack from the Environment Type dropdown menu.

  3. Enter a unique name for this new environment.

  4. Upload the Conda pack archive file.

  5. (Optional) Assign the new profile to an existing environment.

  6. Click Add Environment when you are done.

Add New Python Path

Perform the following steps to add a new Python environment via a Python path.

  1. On the Python Environments page, click Add Environment.

  2. Select Python path from the Environment Type dropdown menu.

  3. Enter a unique name for this new environment.

  4. Specify the PySpark Python path.

  5. (Optional) Assign the new profile to an existing environment.

  6. Click Add Environment when you are done.

Deleting Python Environments

Note: The default environments cannot be deleted.

Click the Actions > Remove option beside the Python environment that you want to delete. A confirmation message will display. Click Confirm to complete the removal.

Delete Python environment

Driverless AI

The Driverless AI section allows you to enable and configure Enterprise Steam to manage Driverless AI instances and add new Driverless AI servers.

Note: If the Kubernetes backend is not enabled, then no backends are available for Driverless AI.

Configuration

  1. Click the Configuration option in the Driverless AI section.

  2. Click Enabled to enable Driverless AI.

  3. Paste your Driverless AI license file in the Driverless AI License File text field.

  4. (Expert config.) For deployments with IDP supporting wildcard in redirect URLs, enable new authentication flow available for DAI 1.10.3 nd later.

  5. Click Save Configuration when you are done.

Engines

Click the Engines option to view a list of available Driverless AI engines. Obtain the Driverless Docker image and upload it your Docker repository.

To add a new Driverless AI engine, click on Add engine button in the top right corner.

  1. Specify the Driverless AI version of your Docker image.

  2. Copy full image name from Docker Hub or your private repository.

  3. (Optional) Set the image pull policy.

  4. (Optional) Set the image pull secret for your private repository.

You can also delete engines that are no longer needed by clicking the Remove engine icon next to the engine you want to delete.

Note: The minimum required version of Driverless AI is 1.9.0!

Python client

Enterprise Steam ships with Driverless AI Python client that is automatically distributed and used by users of Steam Python client. You may upgrade the client by downloading it from Pypi https://pypi.org/project/driverlessai/#files and then uploading it on this screen.

Note: Driverless AI Python client is backward compatible. Make sure the client as at least the same version as your Driverless AI engines.

Driverless AI Python client