Skip to main content

Tutorial 1B: Creating an evaluation model in H2O Document AI - Publisher

This tutorial will walk you through how to create a model using an evaluation annotation set so that you can access accuracy metrics from a trained model.

Prerequisites

  • A copy of this zip file containing medical referral documents that you will be using in this tutorial
  • Completion of the previous tutorial for a basic understanding of H2O Document AI - Publisher

Step 1: Create your project, run OCR, and annotate your files

You will be using the same set-up as the introduction tutorial. There are two ways to begin:

  1. Continue using the same project from Tutorial 1A: Introduction to H2O Document AI - Publisher (continue to Step 2)
  2. Perform the following three steps if you want to make a new project:

Creating the project

From the landing page, create a new project.

  1. Click Create a new project
  2. Provide Eval_Tutorial as the project name
  3. Add the zip file 5pdfs_clean_documents.zip of medical referral documents
  4. Click Create project

Running OCR

Navigate to your Document sets page.

  1. Select the 5pdfs_clean_documents row
  2. Click OCR on the upper navigation bar
  3. Select the best OCR method
  4. Provide the name Tutorial_OCR
  5. Click OCR

This will provide the text label and tokens you need for the apply labels step.

Annotating your files

Navigate to your Annotation sets page. You will now annotate your blank annotation set. Refer to the previous tutorial for an in-depth walkthrough of annotation.

  1. Open 5pdfs_clean_documents Labels in Edit in Page View
  2. Create your regional attributes following the outline of the previous tutorial
  3. Apply your bounding boxes on each document and select the appropriate label for each box
  4. Save your progress
  5. Exit Page View and return to the Annotation sets page

Step 2: Splitting your labels annotation set

Since you are training a model with evaluation, you first need to create a training set and an evalution set. You have to split 5pdfs_clean_documents Labels to do this.

This shows how to access the split annotation set panel from the annotation sets page.

  1. Click the drop-down arrow at the end of 5pdfs_clean_documents Labels
  2. Click Split
  3. Select 80/20 for your split
  4. Provide the name prefix Tutorial-Split
  5. Click Split
The split annotation sets panel.

Step 3: Applying labels to your split annotation sets

You will now apply labels to each of your split annotation sets (so to Tutorial-Split_1 and Tutorial-Split_2).

  1. Click Apply Labels from the upper navigation bar of the Annotation sets page
  2. Select Tutorial_OCR for your text annotation set
  3. Select Tutorial-Split_1 for your labels annotation set
  4. Provide the resulting labeled annotation set the name labeled-split1
  5. (Optional) Provide the description: "This is the applied labels for the training set."
  6. Select Skip the page for what to do when the labels page is missing
  7. Click Apply Labels to create your training applied labels annotation set
  8. Repeat 1-7 for Tutorial-Split_2 to create your evaluation applied labels annotation set
Training labelsEvaluation labels
The labeled split annotation set panel for the training annotation set. The description reads: "This is the applied labels for the training set."The labeled split annotation set panel for the evaluation annotation set. The description reads: "This is the applied labels for the evaluation set."

Step 4: Training your evaluation model

Now that you have your training and evaluation split annotation sets, you can build an evaluation model. From the Annotation sets page, click Train Model on the upper navigation bar.

  1. Switch the model type to Token Labelling
  2. Select labeled-split1 as your training annotation set
  3. Name your resulting model TokenLabelEval
  4. (Optional) Provide This model will have accuracy metrics. as the description
  5. Name your prediction annotation set Prediction
  6. Select labeled-split2 as your evaluation annotation set
  7. Click Train
The Train Model panel.

Your trained model can be found on the Models page. You prediction annotation which has your accuracy metrics can be found on the Annotation sets page with a quality score. Your quality score is the f1-score of the model that was applied to the dataset.

Step 5: Accessing your accuracy metrics

Stay on the Annotation sets page. Your Prediction evaluation annotation set should be at the top. Click Info at the end of the row to bring up the information of Prediction.

The info button at the end of the evaluation annotation set's row. Clicking this button gives you access to the logs and accuracy information.

This brings you to the information panel. Here you can update the name or description of your evaluation set. You can also access the logs and accuracy information. Click Accuracy to access the accuracy information.

The information panel of your evaluation annotation set.

On the accuracy panel, you can scroll through the information on the side bar, you can expand the information into a pop-out screen by clicking Expand, or you can download a JSON of your accuracy information by clicking Download. For this tutorial, click Expand to bring your information up in an easier-to-read pop-out screen.

Your accuracy information is broken down into three tables: aggregate metrics, per class metrics, and a class confusion matrix.

Aggregate metrics

The aggregate metrics measure the accuracy of the entire dataset.

The aggregation metrics.

Per class metrics

The per class metrics measure the accuracy of one class versus the rest of the datapoints in the dataset.

The per class metrics.

Class confusion matrix

The class confusion matrix provides a tabular visualization of the model predictions versus the actual values for all classes.

The confusion matrix.

Summary

In this tutorial, you learned how to train a model with evaluation allowing you to access accuracy metrics. You split your labels annotation set and applied labels to each split. Then, you built a model using those two labeled splits. Finally, you learned how to navigate the accuracy panel.


Feedback