Sparkling Water Logging

Changing Log Directory

Sparkling Water log directory can be specified using the Spark option spark.ext.h2o.log.dir or at run-time using our setters as:

Scala

import ai.h2o.sparkling._
val conf = new H2OConf().setLogDir("dir")
val hc = H2OContext.getOrCreate(conf)

Python

from pysparkling import *
conf = H2OConf().setLogDir("dir")
hc = H2OContext.getOrCreate(conf)

R

library(rsparkling)
sc <- spark_connect(master = "local")
conf = H2OConf()$setLogDir("dir")
hc = H2OContext.getOrCreate(conf)

Logging Directory Selection

Sparkling Water uses the following steps to determine the final logging directory:

  • First, we check if the spark.yarn.app.container.log.dir environmental property is defined. If it is available, we use it as a logging directory.

  • If spark.yarn.app.container.log.dir is missing, we check whether spark.ext.h2o.log.dir is defined and use it if it is not empty.

  • At last, if both options are missing, we store the logs into the default directory ${user.dir}/h2ologs/${sparkAppId}

We can see that spark.yarn.app.container.log.dir has precedence over our logging option. This is to ensure that when running on YARN, the logs are collected by YARN tooling.

Obtaining Logs for Sparkling Water on YARN

When launching Sparkling Water on YARN, you can find the application ID for the YARN job on the resource manager (where you can also find the application master, which is also the Spark master). The following command prints the YARN logs to the console:

yarn logs -applicationId <application id>

Obtaining Logs for Standalone Sparkling Water

By default, the Spark property SPARK_LOG_DIR is set to $SPARK_HOME/work/. To also log the configuration with which the Spark was started, start Sparkling Water with the following configuration:

bin/sparkling-shell.sh --conf spark.logConf=true

The logs for the particular application are located at $SPARK_HOME/work/<application id>. The directory contains also stdout and stderr for each node in the cluster.

Change Sparkling Water Logging Level

To change the log level for H2O running inside the Sparkling Water, you can use the option spark.ext.h2o.log.level or set it as:

Scala

import ai.h2o.sparkling._
val conf = new H2OConf().setLogLevel("DEBUG")
val hc = H2OContext.getOrCreate(conf)

Python

from pysparkling import *
conf = H2OConf().setLogLevel("DEBUG")
hc = H2OContext.getOrCreate(conf)

R

library(rsparkling)
sc <- spark_connect(master = "local")
conf = H2OConf()$setLogLevel("DEBUG")
hc = H2OContext.getOrCreate(conf)

We can also change the logging level used by Spark by modifying the log4j.properties file passed to Spark as:

cd $SPARK_HOME/conf
cp log4j.properties.template log4j.properties

Then either in a text editor or vim, change the contents of the log4j.properties file from:

#Set everything to be logged to the console
log4j.rootCategory=INFO, console
...

to:

#Set everything to be logged to the console
log4j.rootCategory=WARN, console
...