Skip to main content
Version: v0.2.0

Import an annotated dataset

After creating an annotation task, you can import to the annotation task already annotated samples (e.g., images) from the same dataset you used to create the annotation task.

Instructions​

To import an annotated dataset to an annotation task, consider the following instructions:

  1. On the H2O Label Genie navigation menu, click Annotation tasks.
  2. In the annotation tasks table, double-click the row where the annotation task you want to use is located.
  3. Click Import tab.
  4. Upload your dataset with existing annotations.
  5. Click Import annotated samples.
Note
  • H2O Label Genie will allow you to review and approve the imported annotations later.
    • To import them pre-approved: Click the Approve imported labels checkbox (so that you don’t have to approve them in the Annotate tab).
    • H2O Label Genie will not overwrite already approved records by default.
      • To enable overwrites: Click the Overwrite already approved records checkbox.
    • Before importing an annotated dataset to an existing annotation task, the annotated dataset needs to be formatted in a certain way depending on the problem type it aims to solve. To learn more, see Import: Required annotated dataset format.

Import: Required annotated dataset format​

Before importing an annotated dataset to an existing annotation task, the annotated dataset needs to be formatted in a certain way depending on the problem type it aims to solve.

Text classification​

The annotated dataset you want to import to a text classification annotation task needs to be formatted as follows:

A .csv file containing a text and label column. Columns:

  • text: The text column needs to contain the original text input.
    Note

    The text column is used to merge the labels. Please make sure to use the original text column without any duplications. The original text column refers to the text column present in the dataset you used to create the annotation task.

  • label: The label column needs to contain the labels attribute to the texts specified in the text column.

Text regression​

The annotated dataset you want to import to a text regression annotation task needs to be formatted as follows:

A .csv file containing a text and label column. Columns:

  • text: The text column needs to contain the original text input.
Note

The text column is used to merge the labels. Please make sure to use the original text column without any duplications. The original text column refers to the text column present in the dataset you used to create the annotation task.

  • label: The label column needs to contain the labels attributed to the texts specified in the text column.

Text-entity recognition​

The annotated dataset you want to import to a text-entity recognition annotation task needs to be formatted as follows:

A .pq file containing a raw_text, text, and label column. Columns:

  • raw_text: The text column needs to contain the original text input.
    Note

    The raw_text column is used to merge the labels. Please make sure to use the original text column without any duplications. The original text column refers to the text column present in the dataset you used to create the annotation task.

  • text: The text column needs to contain the lists of entities.
  • label: The label column needs to contain the lists of labels for the entities.

Image classification​

The annotated dataset you want to import to an image classification annotation task needs to be formatted as follows:

A .csv file containing an image_path and label column. Columns:

  • image_path: The image_path column needs to specify the locations of the images.
    Note

    The image_path column is used to merge the labels. Please make sure to use the correct relative paths without any duplications.

  • label: The label column needs to contain the labels attributed to the images specified in the image_path column.

Image regression​

The annotated dataset you want to import to an image regression annotation task needs to be formatted as follows:

A .csv file containing an image_path and label column. Columns:

  • image_path: The image_path column needs to specify the locations of the images.
    Note

    The image_path column is used to merge the labels. Please make sure to use the correct relative paths without any duplications.

  • label: The label column needs to contain the labels attributed to the images specified in the image_path column.

Object detection​

The annotated dataset you want to import to an object detection annotation task needs to be formatted as follows:

A .pq file containing an image_path and class_id column; the file should also contain an x_min, x_max, y_min, and* y_max* column corresponding to the bounding box locations. Columns:

  • image_path: The image_path column needs to specify the locations of the images.
Note

The image_path column is used to merge the labels. Please make sure to use the correct relative paths without any duplications.

  • class_id: The class_id column needs to contain the class IDs of the bounding box(es).
  • x_min,x_max,y_min, and y_max: The x_min, x_max, y_min, and y_max specify the spatial location of the bounding box(es).

Audio classification​

The annotated dataset you want to import to an audio classification annotation task needs to be formatted as follows:

A .csv file containing an audio_path and label column. Columns:

  • audio_path: The audio_path column needs to specify the locations of the audios.
    Note

    The audio_path column is used to merge the labels. Please make sure to use the correct relative paths without any duplications.

  • label: The label column needs to contain the labels attributed to the audios specified in the audio_path column.

Audio regression​

The annotated dataset you want to import to an audio regression annotation task needs to be formatted as follows:

A .csv file containing an audio_path and label column. Columns:

  • audio_path: The audio_path column needs to specify the locations of the audios.
    Note

    The audio_path column is used to merge the labels. Please make sure to use the correct relative paths without any duplications.

  • label: The label column needs to contain the labels attributed to the audios specified in the audio_path column.

Feedback