Using Enterprise Steam with R

This section describes how to use the Enterprise Steam for R. Note that this requires “urltools”. Refer to https://github.com/Ironholds/urltools/ for more information.

Downloading and Installing

  1. Go to https://www.h2o.ai/download-enterprise-steam/.
  2. Select the R version that you want to download and install.
  3. Open a Terminal window, and navigate to the location where the Enterprse Steam file was downloaded. For example:
cd ~/Downloads
  1. Install Enterprise Steam for R using R CMD INSTALL <file_name>. For example:
R CMD INSTALL h2osteam_0.9.1.6.tar.gz

login

Use the login function to log in to your Enterprise Steam web server. Note that you must already have a username and a password. The web server and your username and password are provided to you by your Enterprise Steam Admin.

$ r
> library(h2osteam)
> conn <- h2osteam.login(url = "https://steam.0xdata.loc",
                         verify_ssl = F,
                         username="jsmith",
                         password="jsmith")

start_h2o_cluster

Use the start_h2o_cluster function to create a new cluster. This function takes the following parameters:

  • cluster_name: Specify a name for this cluster.
  • num_nodes: Specify the number of nodes for the cluster.
  • node_memory: Specify the amount of memory that should be available on each node.
  • yarn_queue: If your cluster contains queues for allocating cluster resources, specify the queue for this cluster. Note that the YARN Queue cannot contain spaces.
  • callback_ip: Optionally specify the IP address for callback messages from the mapper to the driver (driverif).
  • h2o_version: The H2O engine version that this cluster will use. Note that the Enterprise Steam Admin is responsible for adding engines to Enterprise Steam.
> cluster_config <- h2osteam.start_h2o_cluster(conn = conn,
                                               cluster_name = "first-cluster-from-R",
                                               num_nodes = 2,
                                               node_memory = "30g",
                                               h2o_version = "3.10.4.1")

# Call the cluster to retrieve its ID and configuration params.
> cluster_config
$id
[1] 109

$connect_params
$connect_params$ip
[1] "steam.0xdata.loc"

$connect_params$port
[1] 9999

$connect_params$cookies
[1] "first-cluster-from-R=YW5nZWxhOnVoYzdyeTNtM3g="

$connect_params$context_path
[1] "jsmith_first-cluster-from-R"

$connect_params$https
[1] TRUE

$connect_params$insecure
[1] TRUE

Note that after you create a cluster, you can immediately connect to that cluster and begin using H2O. Refer to the following for a complete R example.

> library(h2o)
> h2o.connect(config = cluster_config)

# import the cars dataset
# this dataset is used to classify whether or not a car is economical based on
# the car's displacement, power, weight, and acceleration, and the year it was made
> cars <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv")

# convert response column to a factor
> cars["economy_20mpg"] <- as.factor(cars["economy_20mpg"])

# set the predictor names and the response column name
> predictors <- c("displacement","power","weight","acceleration","year")
> response <- "economy_20mpg"

# split into train and validation sets
> cars.split <- h2o.splitFrame(data = cars,ratios = 0.8, seed = 1234)
> train <- cars.split[[1]]
> valid <- cars.split[[2]]

# train your model, specifying your 'x' predictors,
# your 'y' the response column, training_frame, and validation_frame
> cars_gbm <- h2o.gbm(x = predictors,
                      y = response,
                      training_frame = train,
                      validation_frame = valid,
                      seed = 1234)

# print the auc for your model
> print(h2o.auc(cars_gbm, valid = TRUE))

get_h2o_cluster

Use the get_h2o_cluster to retrieve information about a specific cluster using the cluster name.

> h2osteam.get_h2o_cluster(conn, 'first-cluster-from-R')
$id
[1] 109

$connect_params
$connect_params$ip
[1] "steam.0xdata.loc"

$connect_params$port
[1] 9999

$connect_params$cookies
[1] "first-cluster-from-R=YW5nZWxhOnVoYzdyeTNtM3g="

$connect_params$context_path
[1] "jsmith_first-cluster-from-R"

$connect_params$https
[1] TRUE

$connect_params$insecure
[1] TRUE

stop_h2o_cluster

Use the stop_h2o_cluster function to stop a cluster.

> h2osteam.stop_h2o_cluster(conn, cluster_config)