Using Enterprise Steam with R

This section describes how to use the Enterprise Steam for R. Note that this requires “urltools”. Refer to for more information.

Downloading and Installing

  1. Go to to retrieve the latest version of Enterprise Steam.
  2. On the Steam API tab, download the R package.
  3. Open a Terminal window, and navigate to the location where the Enterprse Steam file was downloaded. For example:
cd ~/Downloads
  1. Install Enterprise Steam for R using R CMD INSTALL <file_name>. For example:
R CMD INSTALL h2osteam_1.2.0.tar.gz


Use the login function to log in to your Enterprise Steam web server. Note that you must already have a username and a password. The web server and your username and password are provided to you by your Enterprise Steam Admin. This function takes the following parameters:

  • url: The URL of the Enterprise Steam instance
  • verify_ssl: Specify True or False to verify SSL certificate
  • username: Your username as provided by your Enterprise Steam Admin
  • password: Your password as provicded by your Enterprise Steam Admin
  • login_file: A login file where user information is stored.
  • login_file_passphrase: A login file where user passphrase information is stored.
$ r
> library(h2osteam)
> conn <- h2osteam.login(url = "https://steam.0xdata.loc",
                         verify_ssl = F,


Use the start_h2o_cluster function to create a new cluster. This function takes the following parameters:

  • cluster_name: Specify a name for this cluster.
  • profile_name: Specify the profile to use for this cluster.
  • num_nodes: Specify the number of nodes for the cluster.
  • node_memory: Specify the amount of memory that should be available on each node.
  • v_cores: Specify the number of virtual cores.
  • n_threads: Specify the number of threads (CPUs) to use in the cluster. Specify 0 to use all available threads.
  • max_idle_time: Specify the maximum number of hours that the cluster can be idle before gracefully shutting down. Specify 0 to turn off this setting and allow the cluster to remain idle for an unlimited amount of time.
  • max_uptime: Specify the maximum number of hours that the cluster can be running. Specify 0 to turn off this setting and allow the cluster to remain up for an unlimited amount of time.
  • extramempercent: Specify the amount of extra memory for internal JVM use outside of the Java heap. This is a percentage of memory per node. The default (and recommended) value is 10%.
  • h2o_engine_id: The H2O engine version that this cluster will use. Note that the Enterprise Steam Admin is responsible for adding engines to Enterprise Steam.
  • yarn_queue: If your cluster contains queues for allocating cluster resources, specify the queue for this cluster. Note that the YARN Queue cannot contain spaces.
> cluster_config <- h2osteam.start_h2o_cluster(conn = conn,
                                               cluster_name = "first-cluster-from-R",
                                               profile_name = "default",
                                               num_nodes = 2,
                                               node_memory = "30g",
                                               h2o_version = "")

# Call the cluster to retrieve its ID and configuration params.
> cluster_config
[1] 109

[1] "steam.0xdata.loc"

[1] 9999

[1] "first-cluster-from-R=YW5nZWxhOnVoYzdyeTNtM3g="

[1] "jsmith_first-cluster-from-R"

[1] TRUE

[1] TRUE

Note that after you create a cluster, you can immediately connect to that cluster and begin using H2O. Refer to the following for a complete R example.

> library(h2o)
> h2o.connect(config = cluster_config)

# import the cars dataset
# this dataset is used to classify whether or not a car is economical based on
# the car's displacement, power, weight, and acceleration, and the year it was made
> cars <- h2o.importFile("")

# convert response column to a factor
> cars["economy_20mpg"] <- as.factor(cars["economy_20mpg"])

# set the predictor names and the response column name
> predictors <- c("displacement","power","weight","acceleration","year")
> response <- "economy_20mpg"

# split into train and validation sets
> cars.split <- h2o.splitFrame(data = cars,ratios = 0.8, seed = 1234)
> train <- cars.split[[1]]
> valid <- cars.split[[2]]

# train your model, specifying your 'x' predictors,
# your 'y' the response column, training_frame, and validation_frame
> cars_gbm <- h2o.gbm(x = predictors,
                      y = response,
                      training_frame = train,
                      validation_frame = valid,
                      seed = 1234)

# print the auc for your model
> print(h2o.auc(cars_gbm, valid = TRUE))


Use the get_h2o_cluster to retrieve information about a specific cluster using the cluster name.

> h2osteam.get_h2o_cluster(conn, 'first-cluster-from-R')
[1] 109

[1] "steam.0xdata.loc"

[1] 9999

[1] "first-cluster-from-R=YW5nZWxhOnVoYzdyeTNtM3g="

[1] "jsmith_first-cluster-from-R"

[1] TRUE

[1] TRUE


Use the get_h2o_clusters to retrieve all running H2O clusters accessible to current user

> h2osteam.get_h2o_clusters(conn)


Use the stop_h2o_cluster function to stop a cluster.

> h2osteam.stop_h2o_cluster(conn, cluster_config)


Use the show_profiles to show available profiles.

> h2osteam.show_profiles(conn)