H2O Frame as Spark's Data Source -------------------------------- H2O Frame can be used directly as a Spark data source. Reading Spark Data Frame from H2O Frame ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Let's suppose we have an H2OFrame with key ``testFrame`` There are multiple ways in which the Spark data frame can be loaded from H2OFrame: .. content-tabs:: .. tab-container:: Scala :title: Scala .. code:: scala val df = spark.read.format("h2o").option("key", "testFrame").load() .. code:: scala val df = spark.read.format("h2o").load("testFrame") .. tab-container:: Python :title: Python .. code:: python df = spark.read.format("h2o").option("key", "testFrame").load() .. code:: python df = spark.read.format("h2o").load("testFrame") Saving H2O Frame as Spark Data Frame ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Let's suppose we have Spark Data Frame ``df``. There are multiple ways in which the Spark Data Frame can be saved as H2OFrame: .. content-tabs:: .. tab-container:: Scala :title: Scala .. code:: scala df.write.format("h2o").option("key", "new_key").save() .. code:: scala df.write.format("h2o").save("new_key") .. tab-container:: Python :title: Python .. code:: python df.write.format("h2o").option("key", "new_key").save() .. code:: python df.write.format("h2o").save("new_key") All variants save the data frame as an H2OFrame with the key ``new_key``. They will not succeed if the H2OFrame with the same key already exists. Loading & Saving Options ~~~~~~~~~~~~~~~~~~~~~~~~ If the key is specified with the ``key`` option and also in the ``load/save`` method, then the ``key`` option is preferred .. content-tabs:: .. tab-container:: Scala :title: Scala .. code:: scala val df = spark.read.from("h2o").option("key", "key_one").load("key_two") .. code:: scala val df = spark.read.from("h2o").option("key", "key_one").save("key_two") .. tab-container:: Python :title: Python .. code:: python df = spark.read.from("h2o").option("key", "key_one").load("key_two") .. code:: python df = spark.read.from("h2o").option("key", "key_one").save("key_two") In all examples, ``key_one`` is used. Specifying Saving Mode ~~~~~~~~~~~~~~~~~~~~~~ There are four save modes available when saving data using the Data Source API: ``append``, ``overwrite``, ``error`` and ``ignore``. The full description is available in the Spark documentation for `Spark Save Modes `__. - If ``append`` is used, an existing H2OFrame with the same key is deleted, and a new one containing the union of all rows from the original H2O Frame and from the appended Data Frame is created with the same key. - If ``overwrite`` is used, an existing H2OFrame with the same key is deleted, and a new one with the new rows is created with the same key. - If ``error`` is used and an H2OFrame with the specified key already exists, then an exception is thrown. - If ``ignore`` is used and an H2OFrame with the specified key already exists, then no data is changed.