Skip to main content
Version: v0.4.0

Import a dataset

Overview

Before you can import your dataset to H2O Label Genie, the dataset needs to meet the following requirements:

  1. The dataset data type needs to be text, image, or audio.
  2. Dataset format:
    • The dataset (data) for an image or audio annotation task must be in a zip file containing the images or audios.
      Note
      • You can have any nested folder structure inside the zip file.
      • All images need to have an image extension. Images can contain a mix of supported image extensions. To learn about supported image extensions, see Supported image extensions.
      • All audios need to have an audio extension. audios can contain a mix of supported audio extensions. To learn about supported audio extensions, see Supported audio extensions.
    • The dataset (data) for a text annotation task must be in a CSV file or a zip containing a single CSV file.
      • One column needs to hold the text data.
Note

Instructions

To import your dataset (data) to H2O Label Genie, consider the following instructions:

  1. On the H2O Label Genie navigation menu, click Datasets.

  2. Click Import data.

  3. In the Name box, enter a name for the dataset.

  4. (Optional) In the Description box, enter a description for the dataset.

  5. For Data type, choose an option.

    • If the data type of the dataset you are importing is text: Select Text.
    • If the data type of the dataset you are importing is image: Select Image.
    • If the data type of the dataset you are importing is audio: Select Audio.
  6. In the Source list, select the source (data connector) that you want to use (for example, S3).

    1. Click Browse....

    2. Or drag and drop the file (dataset)
  7. Click Import.

Import a dataset

Dataset connectors

Overview

H2O Label Genie provides 2+ dataset connectors to access external data sources.

Supported connectors

Upload (Standard upload feature)

The following parameter is required:

  • File

S3 (Amazon AWS S3)

The following parameters are required:

  • S3 bucket name
  • AWS access key
  • AWS secret key
  • File name

H2O Drive (H2O.ai's data storage)

The following parameter is required:

  • File name
note

To learn more about H2O Drive, see H2O Drive | Docs.

Supported image extensions

The following is a list of supported image extensions for image annotation tasks in H2O Label Genie:

  • Windows bitmaps
    • .bmp (Bitmap Image)
  • JPEG files:
    • .jpeg (JPEG Image)
    • .jpg (JPEG Image)
    • .jpe (JPEG Image)
  • JPEG 2000 files
    • .jp2 (JPEG 2000 Core Image)
  • Portable Network Graphics
    • .png (Portable Network Graphic)
  • WebP:
    • .webp (WebP Image)
  • Portable image format:
    • .pbm (Portable Bitmap Image)
    • .pgm (Portable Gray Map Image)
    • .ppm (Portable Pixmap Image)
    • .pnm (Portable Any Map Image)
  • TIFF files
    • .tiff (Tagged Image File Format)
    • .tif (Tagged Image File)
  • Radiance HDR
    • .hdr (High Dynamic Range Image File)

Supported audio extensions

The following is a list of supported audio extensions for audio annotation tasks in H2O Label Genie:

  • Uncompressed
    • .wav (WAVE Audio)
    • .aiff (Audio Interchange File Format)
  • Lossless compressed
    • .flac (Free Lossless Audio Codec File)
  • Lossy compressed
    • .mp3 (MP3 Audio)
    • .ogg (Ogg Vorbis Audio)

Feedback