Spark - H2O Frame Mapping ------------------------- Type Mapping between H2O H2OFrame Types and Spark DataFrame Types ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For all primitive Scala types or Spark SQL (see ``org.apache.spark.sql.types``) types that can be part of Spark RDD/DataFrame, we provide the mapping into H2O vector types (numeric, categorical, string, time, UUID - see ``water.fvec.Vec``): +----------------------+-----------------+-------------------------+ | Scala type | SQL type | H2O type | +======================+=================+=========================+ | *NA* | BinaryType | Numeric | +----------------------+-----------------+-------------------------+ | Byte | ByteType | Numeric | +----------------------+-----------------+-------------------------+ | Short | ShortType | Numeric | +----------------------+-----------------+-------------------------+ | Integer | IntegerType | Numeric | +----------------------+-----------------+-------------------------+ | Long | LongType | Numeric | +----------------------+-----------------+-------------------------+ | Float | FloatType | Numeric | +----------------------+-----------------+-------------------------+ | Double | DoubleType | Numeric | +----------------------+-----------------+-------------------------+ | String | StringType | String/Categorical [1]_ | +----------------------+-----------------+-------------------------+ | Boolean | BooleanType | Categorical [2]_ | +----------------------+-----------------+-------------------------+ | java.sql.Timestamp | TimestampType | Time | +----------------------+-----------------+-------------------------+ -------------- Type Mapping Between H2O H2OFrame Types and RDD[T] Types ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ As type ``T``, we support the following types: +--------------------------------------------------+ | T | +==================================================+ | *NA* | +--------------------------------------------------+ | Byte | +--------------------------------------------------+ | Short | +--------------------------------------------------+ | Integer | +--------------------------------------------------+ | Long | +--------------------------------------------------+ | Float | +--------------------------------------------------+ | Double | +--------------------------------------------------+ | String | +--------------------------------------------------+ | Boolean | +--------------------------------------------------+ | java.sql.Timestamp | +--------------------------------------------------+ | Any scala class extending scala ``Product`` | +--------------------------------------------------+ | org.apache.spark.mllib.regression.LabeledPoint | +--------------------------------------------------+ As is specified in the table, Sparkling Water provides support for transforming arbitrary scala class extending ``Product``, which are, for example, all case classes. .. rubric:: Footnotes .. [1] The H2O type is String if cardinality is greater than 10 000 0000 or ratio of unique values to all values is 95% or higher. .. [2] The H2O categorical values are "True" and "False" for true and false respectively.