Skip to main content
Version: v0.4.0

Import an annotated dataset

Overview

After creating an annotation task, you can import to the annotation task already annotated samples (for example, annotated images) from the same dataset you used to create the annotation task.

Instructions

To import an annotated dataset to an annotation task, consider the following instructions:

  1. On the H2O Label Genie navigation menu, click Annotation tasks.
  2. In the annotation tasks table, double-click the row where the annotation task you want to use is located.
  3. Click Import tab.
  4. Upload your dataset with existing annotations.
  5. Click Import annotated samples.
Note
  • H2O Label Genie allows you to review and approve the imported annotations later.
    • To import them pre-approved: Click the Approve imported labels checkbox (so that you don’t have to approve them in the Annotate tab).
    • H2O Label Genie does not overwrite already approved records by default.
      • To enable overwrites: Click the Overwrite already approved records checkbox.
    • Before importing an annotated dataset to an existing annotation task, the annotated dataset needs to be formatted in a certain way depending on the problem type it aims to solve. To learn more, see Required annotated dataset format.

import an annotated dataset

Required annotated dataset format

Before importing an annotated dataset to an existing annotation task, the annotated dataset needs to be formatted in a certain way depending on the problem type it aims to solve.

Text classification

The annotated dataset to import to a text classification annotation task needs to be formatted as follows:

A CSV file containing a text and label column. Columns:

  • text: The text column needs to contain the original text input.
    Note

    The text column is used to merge the labels. Please make sure to use the original text column without any duplications. The original text column refers to the text column present in the dataset you used to create the annotation task.

  • label: The label column needs to contain the labels attribute to the texts specified in the text column.

Text regression

The annotated dataset to import to a text regression annotation task needs to be formatted as follows:

A CSV file containing a text and label column. Columns:

  • text: The text column needs to contain the original text input.
    Note

    The text column is used to merge the labels. Please make sure to use the original text column without any duplications. The original text column refers to the text column present in the dataset you used to create the annotation task.

  • label: The label column needs to contain the labels attributed to the texts specified in the text column.

Text-entity recognition

The annotated dataset to import to a text-entity recognition annotation task needs to be formatted as follows:

A Parquet file containing a raw_text, text, and label column. Columns:

  • raw_text: The text column needs to contain the original text input.
    Note

    The raw_text column is used to merge the labels. Please make sure to use the original text column without any duplications. The original text column refers to the text column present in the dataset you used to create the annotation task.

  • text: The text column needs to contain the lists of entities.
  • label: The label column needs to contain the lists of labels for the entities.

Text summarization

The annotated dataset to import to a text summarization annotation task needs to be formatted as follows:

A CSV file containing a text and label column. Columns:

  • text: The text column needs to contain the original text input.
    Note

    The text column is used to merge the labels. Please make sure to use the original text column without any duplications. The original text column refers to the text column present in the dataset you used to create the annotation task.

  • label: The label column needs to contain the summaries attributed to the texts specified in the text column.

Text-generative AI

The annotated dataset to import to a text-generative AI annotation task needs to be formatted as follows:

A CSV file containing the following columns:

  • text: The text column needs to contain the original text input.
    Note

    The text column is used to merge the labels. Please make sure to use the original text column without any duplications. The original text column refers to the text column present in the dataset you used to create the annotation task.

  • label: The label column needs to contain the answers attributed to the texts specified in the text column.

Image classification

The annotated dataset to import to an image classification annotation task needs to be formatted as follows:

A CSV file containing an image_path and label column. Columns:

  • image_path: The image_path column needs to specify the locations of the images.
    Note

    The image_path column is used to merge the labels. Please make sure to use the correct relative paths without any duplications.

  • label: The label column needs to contain the labels attributed to the images specified in the image_path column.

Image regression

The annotated dataset to import to an image regression annotation task needs to be formatted as follows:

A CSV file containing an image_path and label column. Columns:

  • image_path: The image_path column needs to specify the locations of the images.
    Note

    The image_path column is used to merge the labels. Please make sure to use the correct relative paths without any duplications.

  • label: The label column needs to contain the labels attributed to the images specified in the image_path column.

Object detection

The annotated dataset to import to an object detection annotation task needs to be formatted as follows:

A Parquet file containing an image_path and class_id column; the file should also contain an x_min, x_max, y_min, and y_max column corresponding to the bounding box locations. Columns:

  • image_path: The image_path column needs to specify the locations of the images.
    Note

    The image_path column is used to merge the labels. Please make sure to use the correct relative paths without any duplications.

  • class_id: The class_id column needs to contain the class IDs of the bounding box(es).
  • x_min,x_max,y_min, and y_max: The x_min, x_max, y_min, and y_max specify the spatial location of the bounding box(es).

Image instance segmentation

The annotated dataset to import to an image instance segmentation annotation task needs to be formatted as follows:

A JSON file in a COCO format containing the following keys:

  • images: The image ID and file_name (location).
    Note

    The file_name is used to merge the labels. Please make sure to use the correct relative paths without any duplications.

  • annotations: The annotations in a COCO polygon format.
  • categories: The categories ID and name.

Audio classification

The annotated dataset to import to an audio classification annotation task needs to be formatted as follows:

A CSV file containing an audio_path and label column. Columns:

  • audio_path: The audio_path column needs to specify the locations of the audios.
    Note

    The audio_path column is used to merge the labels. Please make sure to use the correct relative paths without any duplications.

  • label: The label column needs to contain the labels attributed to the audios specified in the audio_path column.

Audio regression

The annotated dataset to import to an audio regression annotation task needs to be formatted as follows:

A CSV file containing an audio_path and label column. Columns:

  • audio_path: The audio_path column needs to specify the locations of the audios.
    Note

    The audio_path column is used to merge the labels. Please make sure to use the correct relative paths without any duplications.

  • label: The label column needs to contain the labels attributed to the audios specified in the audio_path column.

Feedback