Skip to main content
Version: v0.2.0

Download an annotated dataset

After annotating your imported dataset (data), you can download it. H2O Label Genie will download the annotated dataset in a format H2O Hydrogen Torch can support.

Instructions​

To download an annotated dataset, consider the following instructions:

  1. On the H2O Label Genie navigation menu, click Annotation tasks.
  2. In the annotation tasks table, double-click the row where the annotation task you want to download is located.
  3. Click Export approved samples.
Note

H2O Label Genie will download a .zip file containing the annotated dataset in a format aligning with the dataset's problem type (annotation task type) and to be supported in H2O Hydrogen Torch. To learn more, see Downloaded dataset formats.

Downloaded dataset formats​

Text classification​

A downloaded text classification dataset (annotated) follows the following dataset format: A .zip file (1) containing a .csv file (2):

folder_name.zip (1)
β”‚ └───csv_name.csv (2)
  1. The available data connectors in H2O Hydrogen Torch require your data to be either in a single .csv file or .zip file for a successful import (upload).
  2. A .csv file containing a text and label column. Columns:
    • text: The text column contains the text input
    • label: The label column contains the labels attributed to the texts specified in the text column

Text regression​

A downloaded text regression dataset (annotated) follows the following dataset format: A .zip file (1) containing a .csv file (2):

folder_name.zip (1)
β”‚ └───csv_name.csv (2)
  1. The available data connectors in H2O Hydrogen Torch require your data to be either in a single .csv file or .zip file for a successful import (upload).
  2. A .csv file containing a text and label column. Columns:
    • text: The text column contains the text input
    • label: The label column contains the labels attributed to the texts specified in the text column

Text-entity recognition​

A downloaded text entity recognition dataset (annotated) follows the following dataset format: A .zip file (1) containing a .pq file (2):

folder_name.zip (1)
β”‚ └───pq_name.pq (2)
  1. The available data connectors in H2O Hydrogen Torch require your data to be either in a single .csv file or .zip file for a successful import (upload).
  2. A .pq file containing a text and label column. Columns:
    • text: The text column contains the text input
    • label: The label column contains the labels attributed to the text-entities specified in the text column

Image classification​

A downloaded image classification dataset (annotated) follows the following dataset format: A .zip file (1) containing a .csv file (2) and an image folder (3).

folder_name.zip (1)
β”‚ └───csv_name.csv (2)
β”‚ β”‚
β”‚ └───image_folder_name (3)
β”‚ └───name_of_image.image_extension
β”‚ └───name_of_image.image_extension
β”‚ └───name_of_image.image_extension
β”‚ ...
  1. The available data connectors in H2O Hydrogen Torch require your data to be in a single .zip file for a successful import (upload).
  2. A .csv file containing an image_path and label column. Columns:
    • image_path: The image_path column specifies the location of the image
    • label: The label column contains the labels attributed to the images specified in the image_path column
  3. An image folder that contains all the images specified in the image_path column; H2O Hydrogen Torch will refer to this folder when running an image classification experiment.

Image regression​

A downloaded image regression dataset (annotated) follows the following dataset format: A .zip file (1) containing a .csv file (2) and an image folder (3).

folder_name.zip (1)
β”‚ └───csv_name.csv (2)
β”‚ β”‚
β”‚ └───image_folder_name (3)
β”‚ └───name_of_image.image_extension
β”‚ └───name_of_image.image_extension
β”‚ └───name_of_image.image_extension
β”‚ ...
  1. The available data connectors in H2O Hydrogen Torch require your data to be in a single .zip file for a successful import (upload).
  2. A .csv file containing an image_path and label column. Columns:
    • image_path: The image_path column specifies the location of the image
    • label: The label column contains the labels attributed to the images specified in the image_path column
  3. An image folder that contains all the images specified in the image_path column; H2O Hydrogen Torch will refer to this folder when running an image regression experiment.

Object detection​

A downloaded object detection dataset (annotated) follows the following dataset format: A .zip file (1) containing a .pq file (2) and an image folder (3).

folder_name.zip (1)
β”‚ └───pq_name.pq (2)
β”‚ β”‚
β”‚ └───image_folder_name (3)
β”‚ └───name_of_image.image_extension
β”‚ └───name_of_image.image_extension
β”‚ └───name_of_image.image_extension
β”‚ ...
  1. The available data connectors in H2O Hydrogen Torch require your data in a single .zip file for a successful import (upload).
  2. A .pq file containing an image_path and class_id column; the file also contains an x_min, x_max, y_min, and* y_max* column corresponding to the bounding box locations. Columns:
    • image_path: The image_path column specifies the location of the image
    • class_id: The class_id column contains the class IDs of the bounding box(es)
    • x_min,x_max,y_min, and y_max: The x_min, x_max, y_min, and y_max specify the spatial location of the bounding box(es)
  3. An image folder that contains all the images specified in the image_path column; H2O Hydrogen Torch will refer to this folder when running an image object detection experiment.

Audio classification​

A downloaded audio classification dataset (annotated) follows the following dataset format: A .zip file (1) containing a .csv file (2) and an audio folder (3).

folder_name.zip (1)
β”‚ └───csv_name.csv (2)
β”‚ β”‚
β”‚ └───audio_folder_name (3)
β”‚ └───name_of_audio.audio_extension
β”‚ └───name_of_audio.audio_extension
β”‚ └───name_of_audio.audio_extension
β”‚ ...
  1. The available data connectors in H2O Hydrogen Torch require your data to be in a single .zip file for a successful import (upload).
  2. A .csv file containing an audio_path and label column. Columns:
    • audio_path: The audio_path column specifies the location of the audio
    • label: The label column contains the labels attributed to the audios specified in the audio_path column
  3. An audio folder that contains all the audios specified in the audio_path column; H2O Hydrogen Torch will refer to this folder when running an audio classification experiment.

Audio regression​

A downloaded audio regression dataset (annotated) follows the following dataset format: A .zip file (1) containing a .csv file (2) and an audio folder (3).

folder_name.zip (1)
β”‚ └───csv_name.csv (2)
β”‚ β”‚
β”‚ └───audio_folder_name (3)
β”‚ └───name_of_audio.audio_extension
β”‚ └───name_of_audio.audio_extension
β”‚ └───name_of_audio.audio_extension
β”‚ ...
  1. The available data connectors in H2O Hydrogen Torch require your data to be in a single .zip file for a successful import (upload).
  2. A .csv file containing an audio_path and label column. Columns:
    • audio_path: The audio_path column specifies the location of the audio
    • label: The label column contains the labels attributed to the audios specified in the audio_path column
  3. An audio folder that contains all the audios specified in the audio_path column; H2O Hydrogen Torch will refer to this folder when running an audio regression experiment.

Feedback