Skip to main content
Version: v0.3.0

Tutorial 2A: Annotation task: Text regression

Overview

This tutorial underlines the steps (process) of annotating and specifying an annotation task rubric for a text regression annotation task. To highlight the process, we are going to annotate a dataset that contains user reviews (in text format) and ratings (from 0 to 5) of Amazon products. This tutorial also quickly explores how you can download the fully annotated dataset supported in H2O Hydrogen Torch.

Step 1: Explore dataset

We are going to use the preloaded amazon-reviews-demo demo dataset for this tutorial. The dataset contains 180 samples (text), each containing a review of an Amazon product. Let's quickly explore the dataset.

  1. On the H2O Label Genie navigation menu, click Datasets.
  2. In the datasets table, click amazon-reviews-demo.

Step 2: Create an annotation task

Now that we have seen the dataset let's create an annotation task that enables you to annotate the dataset. An annotation task refers to the process of labeling data. For this tutorial, the annotation task refers to a text regression annotation task. It assigns one continuous target label to each input text. Let's create an annotation task.

  1. Click New annotation task.
  2. In the Task name box, enter tutorial-2a.
  3. In the Task description box, enter Annotate a dataset containing reviews from Amazon products.
  4. In the Select task list, select Regression.
  5. In the Select text column box, select comment.
  6. Click Create task.

Step 3: Specify annotation task rubric

Before we can start annotating our dataset, we need to specify an annotation task rubric. An annotation task rubric refers to the labels (for example, object classes) you want to use when annotating your dataset. For our dataset, we are going to label each review with a value from 1 to 5, where 1 refers to 1 star, 2 refers to 2 Stars, etc.

  1. In the Data minimum value box, enter 1.
    • The Data minimum value value refers to the minimum value in your continuous values (star ratings from 1 to 5)
  2. In the Data maximum value box, enter 5.
    • The Data maximum value value refers to the maximum value in your continuous values (star ratings from 1 to 5)
  3. In the Data step size (interval) box, enter 1.
    • The Data step size (interval) value refers to the value the label range slider interval takes (the slider is used in the next step to label a review)
  4. Click Apply.

Labeling review dataset with slider from 1 to 5

Let's utilize the slider, not the picker, to annotate the samples. To enable the slider, consider the following instructions:

  1. In the Annotation selection list, select Slider.

Annotation task slider

Step 4: Annotate dataset

Now that we have specified the annotation task rubric, let's annotate the dataset.

  1. Click Continue to annotate. Continue to annotate the dataset

In the Annotate tab, you can individually annotate each review (text) in the dataset. Let's annotate the first review.

  1. In the Label slider, slide to 3.
    Annotated dataset with label of 3
  2. Click Save and next.
    Note
    • Save and next saves the annotated review
    • To skip a review to annotate later: Click Skip.
      • Skipped reviews (samples) reappear after all non-skipped reviews are annotated
    • To download all annotated samples so far, consider the following instructions:
      1. Click the Export tab.
      2. In the Export approved samples list, select Download ZIP.
        Note

        H2O Label Genie downloads a zip file containing the annotated dataset in a format that is supported in H2O Hydrogen Torch. To learn more, see Downloaded dataset formats: Text regression.

Download annotated dataset

After annotating all the reviews, you can download the dataset in a format that H2O Hydrogen Torch supports. Let's download the annotated dataset.

  1. In the Annotate tab, click Export approved samples. Notification of completed annotation task
  2. In the Export approved samples list, select Download ZIP.

Summary

In this tutorial, we learned the process of annotating and specifying an annotation task rubric for a text regression task. We also learned how to download a fully annotated dataset supported in H2O Hydrogen Torch.

Next

To learn the process of annotating and specifying an annotation task rubric for other various annotation tasks in computer vision (CV), natural language processing (NLP), and audio, see Tutorials.


Feedback