Skip to main content
Version: v0.3.0

Download an annotated dataset (approved samples)

Overview

At any point in an annotation task, you can download the already approved (annotated samples). You do not need to fully annotate an imported dataset to download already annotated samples (approved samples). H2O Label Genie downloads the annotated dataset (approved samples) in a format H2O Hydrogen Torch supports.

Instructions

To download an annotated dataset (approves samples), consider the following instructions:

  1. On the H2O Label Genie navigation menu, click Annotation tasks.
  2. In the annotation tasks table, double-click the row where the annotation task you want to download is located.
  3. Click the Export tab.
  4. In the Export approved samples list, select Download ZIP.
    Note

    H2O Label Genie downloads a zip file containing the annotated dataset in a format aligning with the dataset's problem type (annotation task type) and to be supported in H2O Hydrogen Torch. To learn more, see Downloaded dataset formats.

Note

Downloaded dataset formats

Text classification

A downloaded text classification dataset (with approved samples) follows the following dataset format: A zip file (1) containing a CSV file (2):

folder_name.zip (1)
│ └───csv_name.csv (2)
  1. The available data connectors in H2O Hydrogen Torch require your data for a text classification experiment to be either in a single CSV file or zip file for a successful import (upload).
  2. A CSV file containing a text and label column
    • text: The text column contains the text input
    • label: The label column contains the labels attributed to the texts specified in the text column

Text regression

A downloaded text regression dataset (with approved samples) follows the following dataset format: A zip file (1) containing a CSV file (2):

folder_name.zip (1)
│ └───csv_name.csv (2)
  1. The available data connectors in H2O Hydrogen Torch require your data for a text regression experiment to be either in a single CSV file or zip file for a successful import (upload).
  2. A CSV file containing a text and label column
    • text: The text column contains the text input
    • label: The label column contains the labels attributed to the texts specified in the text column

Text-entity recognition

A downloaded text entity recognition dataset (with approved samples) follows the following dataset format: A zip file (1) containing a .pq file (2):

folder_name.zip (1)
│ └───pq_name.pq (2)
  1. The available data connectors in H2O Hydrogen Torch require your data for a text-entity experiment to be either in a single CSV file or zip file for a successful import (upload).
  2. A .pq file containing a text and label column
    • text: The text column contains the text input
    • label: The label column contains the labels attributed to the text-entities specified in the text column

Text summarization

A downloaded text summarization dataset (with approved samples) follows the following dataset format: A zip file (1) containing a CSV file (2):

folder_name.zip (1)
│ └───csv_name.csv (2)
  1. The available data connectors in H2O Hydrogen Torch require your data for a text summarization experiment to be either in a single CSV file or zip file for a successful import (upload).
  2. A CSV file containing a text and label column
    • text: The text column contains the text input
    • label: The label column contains the summaries attributed to the texts specified in the text column

Image classification

A downloaded image classification dataset (with approved samples) follows the following dataset format: A zip file (1) containing a CSV file (2) and an image folder (3).

folder_name.zip (1)
│ └───csv_name.csv (2)
│ │
│ └───image_folder_name (3)
│ └───name_of_image.image_extension
│ └───name_of_image.image_extension
│ └───name_of_image.image_extension
│ ...
  1. The available data connectors in H2O Hydrogen Torch require your data for an image classification to be in a single zip file for a successful import (upload).
  2. A CSV file containing an image_path and label column
    • image_path: The image_path column specifies the location of the image
    • label: The label column contains the labels attributed to the images specified in the image_path column
  3. An image folder that contains all the images specified in the image_path column; H2O Hydrogen Torch refers to this folder when running an image classification experiment.

Image regression

A downloaded image regression dataset (with approved samples) follows the following dataset format: A zip file (1) containing a CSV file (2) and an image folder (3).

folder_name.zip (1)
│ └───csv_name.csv (2)
│ │
│ └───image_folder_name (3)
│ └───name_of_image.image_extension
│ └───name_of_image.image_extension
│ └───name_of_image.image_extension
│ ...
  1. The available data connectors in H2O Hydrogen Torch require your data for an image regression to be in a single zip file for a successful import (upload).
  2. A CSV file containing an image_path and label column
    • image_path: The image_path column specifies the location of the image
    • label: The label column contains the labels attributed to the images specified in the image_path column
  3. An image folder that contains all the images specified in the image_path column; H2O Hydrogen Torch refers to this folder when running an image regression experiment.

Object detection

A downloaded object detection dataset (with approved samples) follows the following dataset format: A zip file (1) containing a .pq file (2) and an image folder (3).

folder_name.zip (1)
│ └───pq_name.pq (2)
│ │
│ └───image_folder_name (3)
│ └───name_of_image.image_extension
│ └───name_of_image.image_extension
│ └───name_of_image.image_extension
│ ...
  1. The available data connectors in H2O Hydrogen Torch require your data for an object detection experiment to be in a single zip file for a successful import (upload).
  2. A .pq file containing an image_path and class_id column; the file also contains an x_min, x_max, y_min, and y_max column corresponding to the bounding box locations
    • image_path: The image_path column specifies the location of the image
    • class_id: The class_id column contains the class IDs of the bounding box(es)
    • x_min,x_max,y_min, and y_max: The x_min, x_max, y_min, and y_max specify the spatial location of the bounding box(es)
  3. An image folder that contains all the images specified in the image_path column; H2O Hydrogen Torch refers to this folder when running an image object detection experiment.

Image instance segmentation

A downloaded image instance segmentation dataset (with approved samples) follows the following dataset format: A zip file (1) containing a JSON file (2) and an image folder (3).

folder_name.zip (1)
│ └───json_name.json (2)
│ │
│ └───image_folder_name (3)
│ └───name_of_image.image_extension
│ └───name_of_image.image_extension
│ └───name_of_image.image_extension
│ ...
  1. The available data connectors in H2O Hydrogen Torch require your data for an image instance segmentation experiment to be in a single zip file for a successful import (upload).
  2. A JSON file containing the labels in a COCO format (+ the annotated dataset's original columns)
  3. An image folder that contains all the images specified in the JSON file; H2O Hydrogen Torch refers to this folder when running an image instance segmentation experiment.

Audio classification

A downloaded audio classification dataset (with approved samples) follows the following dataset format: A zip file (1) containing a CSV file (2) and an audio folder (3).

folder_name.zip (1)
│ └───csv_name.csv (2)
│ │
│ └───audio_folder_name (3)
│ └───name_of_audio.audio_extension
│ └───name_of_audio.audio_extension
│ └───name_of_audio.audio_extension
│ ...
  1. The available data connectors in H2O Hydrogen Torch require your data for an audio classification experiment to be in a single zip file for a successful import (upload).
  2. A CSV file containing an audio_path and label column
    • audio_path: The audio_path column specifies the location of the audio
    • label: The label column contains the labels attributed to the audios specified in the audio_path column
  3. An audio folder that contains all the audios specified in the audio_path column; H2O Hydrogen Torch refers to this folder when running an audio classification experiment.

Audio regression

A downloaded audio regression dataset (with approved samples) follows the following dataset format: A zip file (1) containing a CSV file (2) and an audio folder (3).

folder_name.zip (1)
│ └───csv_name.csv (2)
│ │
│ └───audio_folder_name (3)
│ └───name_of_audio.audio_extension
│ └───name_of_audio.audio_extension
│ └───name_of_audio.audio_extension
│ ...
  1. The available data connectors in H2O Hydrogen Torch require your data for an audio regression experiment to be in a single zip file for a successful import (upload).
  2. A CSV file containing an audio_path and label column
    • audio_path: The audio_path column specifies the location of the audio
    • label: The label column contains the labels attributed to the audios specified in the audio_path column
  3. An audio folder that contains all the audios specified in the audio_path column; H2O Hydrogen Torch refers to this folder when running an audio regression experiment.

Feedback