Spark Frame <–> H2O Frame Conversions¶
Quick links:
Converting an H2OFrame into an RDD[T]¶
Scala
The H2OContext
class provides the method asRDD
, which creates an RDD-like wrapper around the provided H2O’s H2OFrame:
def asRDD[A <: Product: TypeTag: ClassTag](fr: H2OFrame): RDD[A]
The call expects the type A
to create a correctly typed RDD. The conversion requires type A
to be bound by the Product
interface.
The relationship between the columns of the H2OFrame and the attributes of class A
is based on name matching.
Example
case class Person(name: String, age: Int)
val rdd = asRDD[Person](h2oFrame)
Converting an H2OFrame into a DataFrame¶
Scala
The H2OContext
class provides the method asSparkFrame
, which creates a DataFrame-like wrapper around the provided H2OFrame:
def asSparkFrame(fr: H2OFrame): DataFrame
The schema of the created instance of the DataFrame
is derived from the column names and types of the specified H2OFrame
.
Example
val dataFrame = h2oContext.asSparkFrame(h2oFrame)
Python
The H2OContext
class provides the method asSparkFrame
, which creates a DataFrame-like wrapper around the provided H2OFrame:
def asSparkFrame(self, h2oFrame)
The schema of the created instance of the DataFrame
is derived from the column names and types of the specified H2OFrame
.
Example
dataFrame = h2oContext.asSparkFrame(h2oFrame)
R
The H2OContext
class provides the method asSparkFrame
, which creates a DataFrame-like wrapper around the provided H2OFrame:
asSparkFrame = function(h2oFrame)
The schema of the created instance of the DataFrame
is derived from the column names and types of the specified H2OFrame
.
Example
dataFrame <- h2oContext$asSparkFrame(h2oFrame)
Converting an RDD[T] into an H2OFrame¶
Scala
The H2OContext
provides a conversion method from the specified RDD[A]
to H2OFrame
. As with conversion
in the opposite direction, the type A
has to satisfy the upper bound expressed by the type Product
. The conversion
creates a new H2OFrame
, transfers data from the specified RDD, and saves it to the DKV store on the H2O backend.
def asH2OFrame[A <: Product : TypeTag](rdd : RDD[A]): H2OFrame
The API also provides a version, which allows for specifying the name for the resulting H2OFrame.
def asH2OFrame[A <: Product : TypeTag](rdd : RDD[A], frameName: String): H2OFrame
Example
val h2oFrame = h2oContext.asH2OFrame(rdd)
Python
The H2OContext
provides a conversion method from the specified PySpark RDD
to H2OFrame
. The conversion
creates a new H2OFrame
, transfers data from the specified RDD, and saves it to the DKV store on the H2O backend.
def asH2OFrame(self, rdd, h2oFrameName=None, fullCols=-1)
Parameters
rdd
: PySpark RDDh2oFrameName
: Optional name for resulting H2OFramefullCols
: A number of first n columns which are considered for conversion. -1 represents ‘no limit’.
Example
h2oFrame = h2oContext.asH2OFrame(df)
Converting a DataFrame into an H2OFrame¶
Scala
The H2OContext
provides conversion method from the specified DataFrame
to H2OFrame
.
The conversion creates a new H2OFrame
, transfers data from the specified DataFrame
, and saves it
to the DKV store on the H2O backend.
def asH2OFrame(df: DataFrame): H2OFrame
The API also provides a version, which allows for specifying the name for the resulting H2OFrame.
def asH2OFrame(rdd : DataFrame, frameName: String): H2OFrame
Example
val h2oFrame = h2oContext.asH2OFrame(df)
Python
The H2OContext
provides conversion method from the specified DataFrame
to H2OFrame
.
The conversion creates a new H2OFrame
, transfers data from the specified DataFrame
, and saves it
to the DKV store on the H2O backend.
def asH2OFrame(self, sparkFrame, h2oFrameName=None, fullCols=-1)
Parameters
sparkFrame
: PySpark data frameh2oFrameName
: Optional name for resulting H2OFramefullCols
: A number of first n columns which are considered for conversion. -1 represents ‘no limit’.
Example
h2oFrame = h2oContext.asH2OFrame(df)
R
The H2OContext
provides conversion method from the specified DataFrame
to H2OFrame
.
The conversion creates a new H2OFrame
, transfers data from the specified DataFrame
, and saves it
to the DKV store on the H2O backend.
asH2OFrame = function(sparkFrame, h2oFrameName = NULL)
Parameters
sparkFrame
: Spark data frameh2oFrameName
: Optional name for resulting H2OFrame
Example
h2oFrame <- h2oContext$asH2OFrame(df)