Skip to main content
Version: v1.4.0

Extend a dataset with new data

Overview

H2O Hydrogen Torch enables you to extend a dataset with new data, for example, to increase your dataset size.

Note

H2O Hydrogen Torch does not extend a dataset in the sense that rows are combined, and duplicate rows are removed. Extend, in this case, refers to adding new dataset files to a dataset that already has certain dataset files.

Example

Consider the following two datasets (dataset one and dataset two):

dataset_one.zip                                 dataset_two.zip 
│ └───csv_one.csv │ └───csv_two.csv
│ │ │ │
│ └───image_folder_one │ └───image_folder_two
│ └───name_of_image.image_extension │ └───name_of_image.image_extension
│ └───name_of_image.image_extension │ └───name_of_image.image_extension
│ └───name_of_image.image_extension │ └───name_of_image.image_extension
│ ... │ ...

After extending dataset one with dataset two:

extended_dataset_one.zip 
│ └───csv_two.csv
│ └───csv_one.csv
│ │
│ └───image_folder_two
│ │ └───name_of_image.image_extension
│ │ └───name_of_image.image_extension
│ │ └───name_of_image.image_extension
│ │ ...
│ └───image_folder_one
│ └───name_of_image.image_extension
│ └───name_of_image.image_extension
│ └───name_of_image.image_extension
│ ...

Instructions

To extend a dataset with new data, consider the following instructions:

  1. In the H2O Hydrogen Torch navigation menu, click Import dataset.

  2. In the Source list, select the source (data connector) that you want to use (for example, AWS S3).

    1. In the S3 bucket name box, enter the name of the S3 bucket name.

    2. In the AWS access key box, enter the AWS access key.

      Note

      You don't need to enter the AWS access key if the S3 bucket is public.

    3. In the AWS secret key box, enter the AWS secret key.

      Note

      You don't need to enter the AWS secret key if the S3 bucket is public.

    4. In the File name list, select the file you want to use.

  1. Click Continue.

  2. Click Merge with existing dataset.

  3. In the Dataset list, select the dataset you want to extend with the dataset imported above.

  4. Click Merge.

  5. Configure the dataset settings for the dataset being extended.

    Note

    To learn about the import dataset settings, see Import dataset settings.

  6. Click Continue.

  7. Again, click Continue.

    Note

    Before you click Continue, please review the dataset preview.


Feedback