Sparkling Water Troubleshooting¶

This guide lists steps to troubleshoot failing Sparkling Water clusters on Hadoop.

Impersonation¶

The most common issue is that the impersonation is not setup or was setup incorrectly.

Follow the Set Up Hadoop Impersonation guide to set up impersonation. Remember that for changes to take effect, the configuration must be applied to all nodes using the Cloudera Manager (or similar) and the HDFS service must be restarted. To ensure the changes are successfully applied, open a console on the Steam host and check the contents of /etc/hadoop/conf/core-site.xml.

Hadoop Conf Dir¶

In some cases, especially with multiple Spark versions on the same host, Spark may pick up a wrong configuration file. If you notice that Sparkling Water clusters are failing after around 20 minutes, you must change Steam configuration. As Steam administrator, navigate to Configuration-Sparkling Water and toggle the Override Hadoop Conf Dir option. Restart Steam and try again.

Spark Submit¶

A great validation step that uncovers most issues is to submit a test job using the spark-submit command on the Steam host.

First, you will need to obtain the current Steam configuration, because the values will be used in the testing command:

HADOOP_CONF_DIR from Configuration-Hadoop

SPARK_HOME from Configuration-Sparkling Water

PRINCIPAL from Configuration-Hadoop (only in Kerberized environment)

KEYTAB from Configuration-Hadoop (only in Kerberized environment)

Second, locate a JAR file that contains Spark examples (EXAMPLES_JAR), it is usually located relative to your SPARK_HOME. For example /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/spark-example*.jar.

Third, pick a USERNAME of a real Hadoop user that will use Enterprise Steam.

With the gathered information, open a terminal on the machine that hosts Enterprise Steam:

In Kerberized environment, kinit using the principal and keytab we obtained in the previous step. For example:

kinit -kt steam.keytab -p steam@H2OAI.LOC

Export configuration. For example:

export SPARK_HOME=/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export MASTER="yarn"

Submit the testing job:

spark-submit \
    --master yarn \
    --deploy-mode cluster \
    --class org.apache.spark.examples.SparkPi \
    --proxy-user USERNAME \
    EXAMPLES_JAR 10

For example:

spark-submit \
    --master yarn \
    --deploy-mode cluster \
    --class org.apache.spark.examples.SparkPi \
    --proxy-user john12 \
    /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/spark-example*.jar 10

The job should be accepted by the Hadoop cluster, transition to running state and then to finished state in a short time. If you encounter any errors, share them with your Hadoop team for resolution.