Tutorial 3A: Annotation task: Text-entity recognition
Overview
This tutorial underlines the steps (process) of annotating and specifying an annotation task rubric for a text-entity recognition annotation task. To highlight the process, we are going to annotate a dataset that contains user reviews (in text format) and ratings (from 0 to 5) of Amazon products. This tutorial also quickly explores how you can download the fully annotated dataset supported in H2O Hydrogen Torch.
Step 1: Explore dataset
We are going to use the preloaded amazon-reviews-demo demo dataset for this tutorial. The dataset contains 180 samples (text), each containing a review of an Amazon product. Let's quickly explore the dataset.
- On the H2O Label Genie navigation menu, click Datasets.
- In the datasets table, click amazon-reviews-demo.
Step 2: Create an annotation task
Now that we understand the dataset let's create an annotation task that enables you to annotate the dataset. An annotation task refers to the process of labeling data. For this tutorial, the annotation task refers to a text-entity recognition annotation task. It locates and classifies named entities in unstructured text into pre-defined categories (for example, product name). Let's create an annotation task.
- Click New annotation task.
- In the Task name box, enter
tutorial-3a
. - In the Task description box, enter
Annotate a dataset containing reviews from Amazon products
. - In the Select task list, select Entity recognition.
- In the Select text column box, select comment.
- Click Create task.
Step 3: Specify annotation task rubric
Before we can start annotating our dataset, we need to specify an annotation task rubric. An annotation task rubric refers to the labels (for example, object classes) you want to use when annotating your dataset. For our dataset, we can define an array of entities, but for purposes of this tutorial define and use the following two entities: Product and Emotion. Product refers to the Amazon product reviewed, while Emotion refers to one or several expressed (written) feelings during the review.
- In the New object name box, enter
Product
. - Click Add.
- Click Add entity.
- In the New object name box, enter
Emotion
. - For the emotion class color, choose an option (different from the one preselected for the Product object).
- Click Continue to annotate.
Step 4: Annotate dataset
Now that we have specified the annotation task rubric, let's annotate the dataset.
In the Annotate tab, you can individually annotate each review (text) in the dataset. Let's annotate the first review.
Let's start by annotating the product entities in the review.
- Highlight
Charge 5
.
Now, let's annotate the review's emotional (emotion) entities.
- Click Emotion.
- Highlight
I'm disappointed
.NoteYou can attribute a particular entity (Product or Emotion) to a word by clicking it.
- Highlight
I'm not impressed
. - Click Save and next.
- Save and next saves the annotated review
- To skip a review to annotate later: Click Skip.
- Skipped reviews (samples) reappear after all non-skipped reviews are annotated
- To download all annotated samples so far, consider the following instructions:
- Click the Export tab.
- In the Export approved samples list, select Download ZIP. Note
H2O Label Genie downloads a zip file containing the annotated dataset in a format that is supported in H2O Hydrogen Torch. To learn more, see Downloaded dataset formats: Text-entity recognition.
Download annotated dataset
After annotating all the reviews, you can download the dataset in a format that H2O Hydrogen Torch supports. Let's download the annotated dataset.
- In the Annotate tab, click Export approved samples.
- In the Export approved samples list, select Download ZIP.
Summary
In this tutorial, we learned the process of annotating and specifying an annotation task rubric for a text-entity recognition annotation task. We also learned how to download a fully annotated dataset supported in H2O Hydrogen Torch.
Next
To learn the process of annotating and specifying an annotation task rubric for other various annotation tasks in computer vision (CV), natural language processing (NLP), and audio, see Tutorials.
- Submit and view feedback for this page
- Send feedback about H2O Label Genie to cloud-feedback@h2o.ai