Tutorial 4A: Text summarization annotation task
Overview
This tutorial describes the process of creating a text summarization annotation task, including specifying an annotation task rubric for it. To highlight the process, we will annotate a dataset that contains human-generated abstract summaries from news stories published on the Cable News Network (CNN) and Daily Mail websites.
Step 1: Explore dataset
We are going to use the preloaded CNN Daily Mail sample demo dataset for this tutorial. The dataset contains 100 samples (text), each containing a summary of a CNN or Daily Mail article. Let's quickly explore the dataset.
The dataset already contains a summary column. For purposes of this tutorial, we will ignore that column and create our own column to see how one can create a summarization annotation task.
- On the H2O Label Genie navigation menu, click Datasets.
- In the Datasets table, click cnn-dailymail-sample.
Step 2: Create an annotation task
Now that we have seen the dataset let's create an annotation task that enables you to annotate the dataset. For this tutorial, the text summarization annotation task refers to writing a summary for each text input.
- Click New annotation task.
- In the Task name box, enter
tutorial-4a
. - In the Task description box, enter
Annotate a dataset containing summaries from news stories from CNN and the Daily Mail websites
. - In the Select task list, select Summarization.
- In the Select text column box, select text.
- Click Create task.
Step 3: Specify an annotation task rubric
Before we can start annotating our dataset, we need to specify an annotation task rubric. An annotation task rubric refers to the labels (for example, object classes) you want to use when annotating your dataset. Generally, an annotation task rubric refers to the labels (for example, object classes) you want to use when annotating your dataset.
In the case of a summarization annotation task rubric, you need to specify the following two settings in the Rubric tab:
- Select model
- The Select model value refers to the zero-shot learning model to utilize in your annotation task. To learn more, see Annotation tasks + zero-shot learning models: Text summarization
- Max target length
- The Max target length value refers to the minimum character length of your summaries
- For purposes of this tutorial, let's utilize the default model.
- For purposes of this tutorial, let's utilize the default maximum target length.
- Click Continue to annotate.
Step 4: Annotate dataset
In the Annotate tab, you can individually annotate each summary in the dataset. Let's annotate (summarize) the first text.
You can immediately start annotating in the Annotate tab or wait until the zero-shot model is ready to provide annotation suggestions. H2O Label Genie notifies you to Refresh the instance when zero-shot predictions (suggestions) are available. A zero-shot learning model is utilized by default when you annotate a text summarization annotation task. The model accelerates the annotation process by summarizing a given original text.
To learn about the utilized model for a text summarization annotation task, see Zero-shot learning models: Text summarization.
- Click Refresh.
- Click Save and next. Note
- Save and next saves the annotated text
- To skip a text to annotate later: Click Skip.
- Skipped text samples reappear after all non-skipped summaries are annotated
- Annotatate all dataset samples. note
At any point in an annotation task, you can download the already annotated (approved) samples. You do not need to fully annotate an imported dataset to download already annotated samples. To learn more, see Download an annotated dataset
Summary
In this tutorial, we learned how to annotate and specify an annotation task rubric for a text summarization task.
Next
To learn the process of annotating and specifying an annotation task rubric for other various annotation tasks in computer vision (CV), natural language processing (NLP), and audio, see Tutorials.
- Submit and view feedback for this page
- Send feedback about H2O Label Genie to cloud-feedback@h2o.ai