Backends

The Backends section allows you to configure the Hadoop and Kubernetes backends.

Backends section

Hadoop Configuration

In order to enable H2O and Sparkling Water clusters in Enterprise Steam, you must enable the Hadoop backend. If you Hadoop cluster is in secure mode (protected by Kerberos), then you will have to provide a keytab that Enterprise Steam will use to submit jobs on Hadoop.

Note: Your Hadoop admin will have to setup impersonation for Steam user. Follow to the Set Up Hadoop Impersonation section. You will not be able enable Hadoop backend without it.

  1. Click Configuration under the Hadoop Backend section. This opens the Hadoop Configuration page, which shows your current Hadoop version, distribution, and jar file location.

  2. Click Enabled to enable the Hadoop backend.

  3. Enter the absolute path to the Hadoop configuration directory.

  4. Enter the absolute path where Hadoop tmp files are stored.

  5. Specify whether to make all usernames to be lowercase when submitting jobs to Hadoop. This is disabled by default.

  6. Specify whether to enable Kerberos. This is disabled by default. If enabled, then specify the following additional settings:

    1. Kerberos principal: Enter the full name of the Kerberos principal associated with the keytab.

    2. Kerberos keytab path: Enter the path to the Kerberos keytab on the filesystem.

    3. Custom KRB5_CONFIG PATH: Optionally enter the path to a custom KRB5_CONFIG file.

  7. Specify whether to enable Hive support. This is disabled by default. If enabled, then specify the following additional settings:

    1. Hive JDBC Driver Path: Enter path to the Hive JDBC driver on the server.

    2. Hive Principal: Enter full name of the Hive Kerberos principal.

    3. Hive Host: Enter the hostname of the Hive server.

    4. Hive JDBC URL Pattern: Enter the Hive JDBC URL Patter. This is optional but either Hive Host or Hive JDBC URL Pattern or both has to be set.

    5. Hive SSL Enabled: Toggle to enable an SSL connection for the specified Hive JDBC URL

  8. Click Save Configuration when you are done.

Hadoop Configuration path

Kubernetes

Configuration

  1. Click Configuration under the Kubernetes Backend section. This opens the Kubernetes Configuration page.

  2. Click Enabled to enable the Kubernetes backend.

  3. Select whether Enterprise Steam is installed inside the Kubernetes cluster or not. It most likely is, especially if you installed it via the Helm Chart. If set to false you must specify path to the Kubeconfig file.

  4. Select whether to automatically detect and use default k8s StorageClass for k8s PersistentVolumeClaim provisioning. If set to false, you must manually specify StorageClass name.

  5. Select whether to allow users to expand k8s PersistentVolumes allocated by them. This must be supported by your k8s StorageClass.

  6. Specify the GPU resource name. Use ‘nvidia.com/gpu’ for NVIDIA and ‘amd.com/gpu’ for AMD graphic cards.

  7. Specify the UID which will be used to launch pods if no UID was received from the authentication provider.

  8. Specify the GID which will be used to launch pods if no GID was received from the authentication provider.

  9. Specify whether to force the fallback UID/GID even when UID/GID was received from authentication provider.

  10. Click Save Configuration when you are done.

Kubernetes Configuration path

Volume Mounts

This page is used to configure additional data volumes that can be mounted to instances/clusters. Volumes can be writeable or read-only and mount to specified directory. Once created, the volumes need to be attached to Steam profile after which they are available next time the instance/cluster is launched.

Kubernetes Volume Mounts

Creating Volume Mounts

On this page you can create a new Kubernetes Volume Mount.

  1. Enter identifying name of the volume.

  2. Enter the path where this volume will be mounted at inside the instance/cluster. Must not contain ‘:’.

  3. Choose whether the volume will be only mounted as read-only or not.

  4. Select a Volume type. Choices are: Host Path, Kubernetes Secret, Kubernetes ConfigMap, Network File System (NFS), Persistent Volume Claim or Container Storage Interface (CSI).

  5. Base on the selected volume type, read the information bubble and fill in the parameters.

  6. Click Create Volume when you are done.

Don’t forget to add the created volume to Driverless AI - Kubernetes or H2O - Kubernetes profile!

Creating Kubernetes Volume Mounts

Creating Volume Mounts - Host Path

A Host Path volume mounts a file or directory from the host node’s filesystem into your instance/cluster. This is not something that most instances/clusters will need, but it offers a powerful escape hatch for some applications.

Caution: Instances/clusters with identical configuration may behave differently on different nodes due to different files on the nodes. The files or directories created on the underlying hosts are only writable by root. You need to modify the file permissions on the host to be able to write to a Host Path volume.

  1. Path: Path of the directory on the host. If the path is a symlink, it will follow the link to the real path.

Creating Volume Mounts - Kubernetes Secret

A secret volume is used to pass sensitive information, such as passwords, to instances/clusters. You can store secrets in the Kubernetes API and mount them as files for use by instances/clusters without coupling to Kubernetes directly. Secret volumes are backed by tmpfs (a RAM-backed filesystem) so they are never written to non-volatile storage.

Caution: You must create a secret in the Kubernetes API before you can use it.

  1. Secret Name: Name of the Secret in the pod’s namespace to use.

Creating Volume Mounts - Kubernetes ConfigMap

The ConfigMap resource provides a way to inject configuration data into instances/clusters. The data stored in a Kubernetes ConfigMap object can be made accessible from the instance/cluster.

Caution: You must create a ConfigMap in the Kubernetes API before you can use it.

  1. Config Map Name: Name of the ConfigMap in the pod’s namespace to use.

Creating Volume Mounts - NFS

An NFS volume allows an existing NFS (Network File System) share to be mounted into your Pod. The contents of an NFS volume are preserved and the volume is merely unmounted. This means that an NFS volume can be pre-populated with data, and that data can be “handed off” between instances/clusters. NFS can be mounted by multiple writers simultaneously.

Caution: You must have your own NFS server running with the share exported before you can use it.

  1. Server: Hostname or IP address of the NFS server.

  2. Path: Path that is exported by the NFS server.

Creating Volume Mounts - Persistent Volume Claim

A PVC volume is used to mount a PersistentVolume into a instance/cluster. PersistentVolumes are a way for administrators to “claim” durable storage (such as a GCE PersistentDisk or an iSCSI volume).

  1. Claim Name: Name of a PersistentVolumeClaim in the same namespace as the pod using this volume.

Creating Volume Mounts - CSI

Container Storage Interface (CSI) defines a standard interface for container orchestration systems (like Kubernetes) to expose arbitrary storage systems to their container workloads. Once a CSI compatible volume driver is deployed on a Kubernetes cluster, users may use the csi volume type to attach, mount, etc. the volumes exposed by the CSI driver.

  1. Driver: Driver is the name of the CSI driver that handles this volume. Consult with your admin for the correct name as registered in the cluster.

  2. FS Type: Filesystem type to mount. Ex. ‘ext4’, ‘xfs’, ‘ntfs’. If not provided, the empty value is passed to the associated CSI driver which will determine the default filesystem to apply.

  3. Volume Attributes: Volume Attributes stores driver-specific properties that are passed to the CSI driver. Consult your driver’s documentation for supported values. Insert one key=value pair per line.

HDFS and Hive

On this page you can configure access to HDFS and Hive from Kubernetes. Remember, instances/clusters start as a minimal container and all configuration files must be mounted to it. All HDFS configuration files must be first stored as Volumes and paths configured on this page must originate from these volumes.

  1. Click HDFS under the Kubernetes Backend section. This opens the Kubernetes HDFS Configuration page.

  2. Click Enabled to enable HDFS on Kubernetes backend.

  3. Enter the absolute path to the Hadoop configuration folder (HADOOP_CONF_DIR) of the Hadoop cluster.

  4. Enter the absolute path to the Kerberos keytab that is configured to impersonate Enterprise Steam users. See the Set Up Hadoop Impersonation section for more details.

  5. Enter the name of Kerberos principal associated with the keytab above.

  6. Enter the absolute path to the Kerberos configuration file (krb5.conf).

  7. Optionally enter a Java classpath that will be used when interacting with HDFS.

  8. Optionally enable Hive support and add the Hive JDBC JAR file to the Hive classpath.

  9. Mount volumes so the path mentioned in the configuration originate from the mounted volumes.

  10. Click Save Configuration when you are done.

Kubernetes HDFS Configuration

Minio

On this page you can configure access to Minio from Kubernetes. It is important to decide whether to use global Minio credentials or not. When global credentials are set, every instance/cluster and every user will have access to the same set of Minio buckets. If users have their own Minio credentials to access their own set of buckets you must disable global credentials. In that case users will have to enter their Minio credentials in Steam configuration as described in the user documentation.

  1. Click Minio under the Kubernetes Backend section. This opens the Kubernetes Minio Configuration page.

  2. Click Enabled to enable Minio connection on Kubernetes backend.

  3. Enter the Minio endpoint URL including http or https prefix.

  4. Decide whether to enable or disable verification of TLS certificate presented by the Minio server.

  5. Decide whether to use global Minio credentials or not.

  6. If Global Minio credentials are used, enter Minio Access and Secret key.

  7. Click Save Configuration when you are done.

Kubernetes Minio Configuration

H2O.ai Storage

On this page you can configure access to H2O.ai storage from Kubernetes. If storage is configured to use mutual TLS you must create Kubernetes Secrets that hold CA certificate and client certificates.

  1. Click H2O.ai Storage under the Kubernetes Backend section.

  2. Click Enabled to enable H2O.ai storage connection on Kubernetes backend.

  3. Enter the H2O.ai Storage address.

  4. Enter comma separated list of OpenID scopes that H2O.ai Storage expect to have in the OpenID access token.

  5. Set if H2O.ai Storage is protected with mutual TLS or not.

  6. If mutual TLS is enabled you must specify the Kubernetes Secrets that contain CA and client certificate.

  7. Click Save Configuration when you are done.

Kubernetes H2O.ai Storage Configuration