Skip to main content

Using annotation sets

Work with available annotation sets or create new ones. An annotation set can contain multiple types of annotations, including text (usually generated from the optical character recognition [OCR] process), token entity annotations of type “label”, and page annotations of type “class”.

Apply Labels

This action applies labels from the “Labels” annotation set to the boxes of a Text annotation set (typically OCR token boxes). The entity label is applied to the token box if the token box sufficiently overlaps the entity box. This job also combines attribute types from the two sets you are using (e.g. giving the applied label set the attributes text and label).

  1. Select the Text Annotation Set
  2. Select the Labels Annotation Set
  3. Provide a name for the resulting labeled Annotation Set
  4. (Optional) Provide a description
  5. Select what this method should do if the labels page is missing
  6. Click Apply Labels

The newly labeled annotation set will appear at the top of your annotation sets list. Open the file in “Edit in Page View” to see the annotated labels.

Predict Using Model

Predict on an annotation set using a successfully built model.

  1. Select the model to use for predictions
  2. Provide a name for the prediction set
  3. (Optional) Provide a description for the prediction set
  4. Select the Evaluation Annotation set
  5. Click Predict to retrieve the predictions

Train Model

Train a model based on your annotation sets. Training models is the H2O Document AI - Publisher job that requires the most time.

  1. Select the model type. One of:
    • Token Labeling - requires the annotation set to have the label and text file attributes
    • Page Classification - requires the annotation set to have the class and text file attributes
  2. Select the specific annotation set you want to train the model
  3. (Optional) Set the batch size and number of epochs
  4. Provide a name for the resulting model
  5. (Optional) Provide a description of the model
  6. Click Train to build the model

In the Train Model screen, you can also select whether you want to evaluate the model. This lets you train a model using a validation set. Before building the model:

  1. Toggle Evaluate on
  2. Provide a name for the Prediction Annotation set
  3. Select the Evaluation Annotation Set

The model will appear on the Models page when it is finished. If you provided a validation set to build your model, you will also be able to access the accuracy of your model. You view the accuracy of your model on the prediction annotation set which is generated after your model is built. The prediction annotation set can be accessed from the Annotation sets page. Learn more about the accuracy panel.

Concatenate

You can concatenate (combine) annotation sets that have the same attributes.

  1. Select the annotation sets you want to combine (sets must contain the same attributes)
  2. Provide a result name
  3. (Optional) Provide a description for your concatenated annotation set
  4. Click Concatenate

The concatenated annotation set will appear in the Annotation sets page.

Export

Export the selected annotation set(s).

  1. Select the file(s) you want to export
  2. Click Export on the upper navigation bar

Import annotations

Import a JSON file of a previously annotated annotation set that you have saved to your local computer.

  1. Provide a name for the imported annotation set
  2. (Optional) Provide a description
  3. Drag and drop the JSON file or browse your local files for the file you want to import
  4. Click Import

Interacting with an annotation set

Each annotation set has an Info button at the end of the row. Clicking Info will give you details of your annotation set (e.g. the description or number of pages). You can also find the logs for your annotation sets here. To see the full log, click Expand. You can also download the log by clicking Download.

The drop-down arrow next to the Info button gives you the option to edit your annotation set in Page View, rename your annotation set, split your annotation set, export your annotation set, or delete your annotation set.

Edit in Page View

Annotate documents using the VGG Image Annotator (VIA) tool. Fill in the regions on the image and assign region or file attributes.

Tool Bar

Save button on the top bar of the Page View page. : Save all the changes you have made to the annotation set within the project.

Export button on the top bar of the Page View page. : Export your annotation set as a JSON file to your local computer.

Grid button on the top bar of the Page View page. : Display all files in a grid. From here, you can group images by file or region type and display images in collated groups.

Annotation Editor button on the top bar of the Page View page.: Toggle the annotation editor open and closed.

Size Scroller button on the top bar of the Page View page. : Zoom in and out of the image you are actively annotating by using either the buttons or the scroller.

Select All Regions button on the top bar of the Page View page. : Select all the regions on the image you’re actively working on. This is convenient if you’re working with multiple of the same image with different information on each image since you can copy over all the regions at once.

Copy Selected button on the top bar of the Page View page. : Copy the selected region(s).

Paste Selected button on the top bar of the Page View page. : Paste the copied region(s).

Paste to Many button on the top bar of the Page View page. : Paste the copied region(s) to multiple images at once.

Undo Paste to Many button on the top bar of the Page View page.: Undo the paste to many command.

Filter files

This box lets you filter which files you want to see. One of:

  • All files - shows all available files
  • Show files without regions - shows files without any added bounding boxes
  • Show files missing region annotations - shows files that have bounding boxes but are missing annotations for those boxes
  • Show files missing file annotations - shows files that are missing file attributes
  • File that could not be loaded - shows files that could not be loaded
  • Regular Expression - search for a file or group of files by name

Here, you can import new files by choosing them from your local files or by providing a URL. You can also remove files by providing a file id, file name, and a number of regions.

File filters you can apply to your images in Page View. Located in the top left box.

Current workaround
Workaround

If you are adding a file that is not an image:

  • First, export the PDF as a JPG or PNG.
  • Then, add the file in edit in page view.

If the number of documents and pages does not update when you add a new file in edit in page view, refresh the page or click on another menu item (e.g. Document Sets) and click back to Annotation Sets to see the updated number of pages.

Attributes

This box contains the ability to add region and file attributes. Toggle which one you are actively working with by selecting from the two options.

info

You can edit or remove your annotations by clicking the Annotations Editor page button on the tool bar, by clicking the “Toggle annotation editor” note in the Attributes box, or by pressing the space bar.

Region attributes

Region attributes allot attributes to specific regions on the page. Applying region attributes gives the annotation set the label attribute.

  1. In the Attributes box, ensure that Region attributes is selected.
  2. In the Attribute name field, enter "label". (Note: For this step, the name of the attribute must be set to "label" in order for H2O Document AI to work correctly).
  3. Click + to add the new attribute.

For the new attribute, you can now create option ids.

  1. Select the attribute type (one of: text, checkbox, radio, image, or dropdown)

    • text - assign an individual id to each region
    • checkbox - assign a single or multiple ids to each region from the created option id list
    • radio - assign a single id to each region from the created option id list
    • image - assign an image (provided via an image URL or b64) to each region from the created option id list
    • dropdown - assign a single id to each region from the created option id list
  2. (checkbox, radio, image, dropdown only) Begin filling in the option ids (e.g. “PatientName” or “SupplierAddress”) in the “Add new option id” box

    • (Optional) Provide a description of the option id
    • (Optional) Set an option id as the default value (def). This value will be the first to appear when selecting option ids for each region

Region attributes box where you set your label attributes in Page View. Located in the middle left box.

File attributes

File attributes allot attributes for the entire page, not just a region drawn on a page (e.g. tagging a page as a medical referral). Applying file attributes gives the annotation set the class attribute.

  1. Provide a name for the attribute
  2. (Optional) Provide a description
  3. Select the attribute type (one of: text, checkbox, radio, image, or dropdown)
  4. (checkbox, radio, image, dropdown only) Begin filling in the option ids (e.g. “MedReferral” or “BirthCert”) in the “Add new option id” box
  5. Assign the file an attribute by opening the annotation editor (one way is to press the space bar). Go to the File Annotations section. Next to file name (the file you are currently on), choose a class for the file attribute or (text only) type in the class

File attributes box where you set your class attributes in Page View. Also located in the middle left box.

Using bounding boxes to create regions

On your document, find an area you want annotated. Draw regions on the image using bounding boxes:

  1. Press the left mouse button on the corner of the region you want to bound
  2. Drag the mouse cursor to encompass the object you are bounding
  3. Release the mouse button

The region should still be higlighted (colored gray). You can now select an option id (checkbox, radio, image, dropdown only) from the available dropdown menu or type the option id into the text box (text only).

GIF showing how to draw a bounding box and select the appropriate label (in this case the Referral Date).

reminder

Save your annotations before leaving Page View. This ensures your changes are saved and available to be loaded later for continued annotation.

Rename

Rename the annotation set and provide a new description.

Split

Split the annotation set.

  1. Select the split ratio or input a custom split (either by regular expression or manually input the split)
  2. Toggle whether you want to split on a document's boundaries
  3. Provide a name prefix for the resulting annotation sets
  4. (Optional) Provide a description

Export

Export the annotation set to your local computer.

Delete

Delete the annotation set. You will be prompted to acknowledge that the act of deletion is irreversible before you can delete your set


Feedback