Using Enterprise Steam with R¶
This section describes how to use the Enterprise Steam for R. Note that this requires “urltools”. Refer to https://github.com/Ironholds/urltools/ for more information.
Downloading and Installing¶
- Go to https://www.h2o.ai/download-enterprise-steam/.
- Select the R version that you want to download and install.
- Open a Terminal window, and navigate to the location where the Enterprse Steam file was downloaded. For example:
- Install Enterprise Steam for R using
R CMD INSTALL <file_name>. For example:
R CMD INSTALL h2osteam_0.9.1.6.tar.gz
login function to log in to your Enterprise Steam web server. Note that you must already have a username and a password. The web server and your username and password are provided to you by your Enterprise Steam Admin.
$ r > library(h2osteam) > conn <- h2osteam.login(url = "https://steam.0xdata.loc", verify_ssl = F, username="jsmith", password="jsmith")
start_h2o_cluster function to create a new cluster. This function takes the following parameters:
cluster_name: Specify a name for this cluster.
num_nodes: Specify the number of nodes for the cluster.
node_memory: Specify the amount of memory that should be available on each node.
yarn_queue: If your cluster contains queues for allocating cluster resources, specify the queue for this cluster. Note that the YARN Queue cannot contain spaces.
callback_ip: Optionally specify the IP address for callback messages from the mapper to the driver (driverif).
h2o_version: The H2O engine version that this cluster will use. Note that the Enterprise Steam Admin is responsible for adding engines to Enterprise Steam.
> cluster_config <- h2osteam.start_h2o_cluster(conn = conn, cluster_name = "first-cluster-from-R", num_nodes = 2, node_memory = "30g", h2o_version = "188.8.131.52") # Call the cluster to retrieve its ID and configuration params. > cluster_config $id  109 $connect_params $connect_params$ip  "steam.0xdata.loc" $connect_params$port  9999 $connect_params$cookies  "first-cluster-from-R=YW5nZWxhOnVoYzdyeTNtM3g=" $connect_params$context_path  "jsmith_first-cluster-from-R" $connect_params$https  TRUE $connect_params$insecure  TRUE
Note that after you create a cluster, you can immediately connect to that cluster and begin using H2O. Refer to the following for a complete R example.
> library(h2o) > h2o.connect(config = cluster_config) # import the cars dataset # this dataset is used to classify whether or not a car is economical based on # the car's displacement, power, weight, and acceleration, and the year it was made > cars <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") # convert response column to a factor > cars["economy_20mpg"] <- as.factor(cars["economy_20mpg"]) # set the predictor names and the response column name > predictors <- c("displacement","power","weight","acceleration","year") > response <- "economy_20mpg" # split into train and validation sets > cars.split <- h2o.splitFrame(data = cars,ratios = 0.8, seed = 1234) > train <- cars.split[] > valid <- cars.split[] # train your model, specifying your 'x' predictors, # your 'y' the response column, training_frame, and validation_frame > cars_gbm <- h2o.gbm(x = predictors, y = response, training_frame = train, validation_frame = valid, seed = 1234) # print the auc for your model > print(h2o.auc(cars_gbm, valid = TRUE))
get_h2o_cluster to retrieve information about a specific cluster using the cluster name.
> h2osteam.get_h2o_cluster(conn, 'first-cluster-from-R') $id  109 $connect_params $connect_params$ip  "steam.0xdata.loc" $connect_params$port  9999 $connect_params$cookies  "first-cluster-from-R=YW5nZWxhOnVoYzdyeTNtM3g=" $connect_params$context_path  "jsmith_first-cluster-from-R" $connect_params$https  TRUE $connect_params$insecure  TRUE
stop_h2o_cluster function to stop a cluster.
> h2osteam.stop_h2o_cluster(conn, cluster_config)