Tutorial 2A: Text regression annotation task
Overview​
This tutorial describes the process of creating a text regression annotation task, including specifying an annotation task rubric for it. To highlight the process, we will annotate a dataset containing user reviews (in text format) and ratings (from 0 to 5) of Amazon products.
Step 1: Explore dataset​
We are going to use the preloaded Amazon reviews demo dataset for this tutorial. The dataset contains 180 samples (text), each containing a review of an Amazon product. Let's quickly explore the dataset.
- On the H2O Label Genie navigation menu, click Datasets.
- In the Datasets table, click amazon-reviews-demo.
Step 2: Create an annotation task​
Now that we have seen the dataset let's create an annotation task that enables you to annotate the dataset. For this tutorial, the annotation task refers to a text regression annotation task assigning one continuous target label to each input text.
- Click New annotation task.
- In the Task name box, enter
tutorial-2a
. - In the Task description box, enter
Annotate a dataset containing reviews from Amazon products
. - In the Select task list, select Regression.
- In the Select text column box, select comment.
- Click Create task.
Step 3: Specify an annotation task rubric​
Before we can start annotating our dataset, we need to specify an annotation task rubric. An annotation task rubric refers to the labels (for example, object classes) you want to use when annotating your dataset. For our dataset, we are going to label each review with a value from 1 to 5, where 1 refers to 1 star, 2 refers to 2 Stars, etc.
The dataset has a start column rating a review on a start basis with the following structure: 3.0 out of 5 stars
. The new column (a label column) we will create for purposes of this tutorial will only contain a number value from 1-5.
- In the Data minimum value box, enter
1
.- The Data minimum value value refers to the minimum value in your continuous values (star ratings from 1 to 5)
- In the Data maximum value box, enter
5
.- The Data maximum value value refers to the maximum value in your continuous values (star ratings from 1 to 5)
- In the Data step size (interval) box, enter
1
.- The Data step size (interval) value refers to the value the label range slider interval takes (the slider is used in the next step to label a review)
- Click Apply.
Let's utilize the slider, not the picker, to annotate the samples. To enable the slider, consider the following instructions:
- In the Annotation selection list, select Slider.
Step 4: Annotate dataset​
Now that we have specified the annotation task rubric, let's annotate the dataset.
- Click Continue to annotate.
In the Annotate tab, you can individually annotate each review (text) in the dataset. Let's annotate the first review.
- In the Label slider, slide to 3 (a random value for purposes of this tutorial).
- Click Save and next.
Note
- Save and next saves the annotated review
- To skip a review to annotate later: Click Skip.
- Skipped reviews (samples) reappear after all non-skipped reviews are annotated
- Annotate all dataset samples.
note
At any point in an annotation task, you can download the already annotated (approved) samples. You do not need to fully annotate an imported dataset to download already annotated samples. To learn more, see Download an annotated dataset.
Summary​
In this tutorial, we learned how to annotate and specify an annotation task rubric for a text regression task.
Next​
To learn the process of annotating and specifying an annotation task rubric for other various annotation tasks in computer vision (CV), natural language processing (NLP), and audio, see Tutorials.
- Submit and view feedback for this page
- Send feedback about H2O Label Genie to cloud-feedback@h2o.ai