Tutorial 1B: Creating an evaluation model in H2O Document AI - Publisher

This tutorial will walk you through how to create a model using an evaluation annotation set so that you can access accuracy metrics from a trained model.

Prerequisites

A copy of this zip file containing medical referral documents that you will be using in this tutorial
Completion of the previous tutorial for a basic understanding of H2O Document AI - Publisher

Step 1: Create your project, run OCR, and annotate your files

You will be using the same set-up as the introduction tutorial. There are two ways to begin:

Continue using the same project from Tutorial 1A: Introduction to H2O Document AI - Publisher (continue to Step 2)
Perform the following three steps if you want to make a new project:

Creating the project

From the landing page, create a new project.

Click Create a new project
Provide Eval_Tutorial as the project name
Add the zip file 5pdfs_clean_documents.zip of medical referral documents
Click Create project

Running OCR

Navigate to your Document sets page.

Select the 5pdfs_clean_documents row
Click OCR on the upper navigation bar
Select the best OCR method
Provide the name Tutorial_OCR
Click OCR

This will provide the text label and tokens you need for the apply labels step.

Annotating your files

Navigate to your Annotation sets page. You will now annotate your blank annotation set. Refer to the previous tutorial for an in-depth walkthrough of annotation.

Open 5pdfs_clean_documents Labels in Edit in Page View
Create your regional attributes following the outline of the previous tutorial
Apply your bounding boxes on each document and select the appropriate label for each box
Save your progress
Exit Page View and return to the Annotation sets page

Step 2: Splitting your labels annotation set

Since you are training a model with evaluation, you first need to create a training set and an evalution set. You have to split 5pdfs_clean_documents Labels to do this.

Click the drop-down arrow at the end of 5pdfs_clean_documents Labels
Click Split
Select 80/20 for your split
Provide the name prefix Tutorial-Split
Click Split

Step 3: Applying labels to your split annotation sets

You will now apply labels to each of your split annotation sets (so to Tutorial-Split_1 and Tutorial-Split_2).

Click Apply Labels from the upper navigation bar of the Annotation sets page
Select Tutorial_OCR for your text annotation set
Select Tutorial-Split_1 for your labels annotation set
Provide the resulting labeled annotation set the name labeled-split1
(Optional) Provide the description: "This is the applied labels for the training set."
Select Skip the page for what to do when the labels page is missing
Click Apply Labels to create your training applied labels annotation set
Repeat 1-7 for Tutorial-Split_2 to create your evaluation applied labels annotation set

Training labels	Evaluation labels

Step 4: Training your evaluation model

Now that you have your training and evaluation split annotation sets, you can build an evaluation model. From the Annotation sets page, click Train Model on the upper navigation bar.

Switch the model type to Token Labelling
Select labeled-split1 as your training annotation set
Name your resulting model TokenLabelEval
(Optional) Provide This model will have accuracy metrics. as the description
Name your prediction annotation set Prediction
Select labeled-split2 as your evaluation annotation set
Click Train

Your trained model can be found on the Models page. You prediction annotation which has your accuracy metrics can be found on the Annotation sets page with a quality score. Your quality score is the f1-score of the model that was applied to the dataset.

Step 5: Accessing your accuracy metrics

Stay on the Annotation sets page. Your Prediction evaluation annotation set should be at the top. Click Info at the end of the row to bring up the information of Prediction.

The info button at the end of the evaluation annotation set's row. Clicking this button gives you access to the logs and accuracy information.

This brings you to the information panel. Here you can update the name or description of your evaluation set. You can also access the logs and accuracy information. Click Accuracy to access the accuracy information.

The information panel of your evaluation annotation set.

On the accuracy panel, you can scroll through the information on the side bar, you can expand the information into a pop-out screen by clicking Expand, or you can download a JSON of your accuracy information by clicking Download. For this tutorial, click Expand to bring your information up in an easier-to-read pop-out screen.

Your accuracy information is broken down into three tables: aggregate metrics, per class metrics, and a class confusion matrix.

Aggregate metrics

The aggregate metrics measure the accuracy of the entire dataset.

Per class metrics

The per class metrics measure the accuracy of one class versus the rest of the datapoints in the dataset.

Class confusion matrix

The class confusion matrix provides a tabular visualization of the model predictions versus the actual values for all classes.

Summary

In this tutorial, you learned how to train a model with evaluation allowing you to access accuracy metrics. You split your labels annotation set and applied labels to each split. Then, you built a model using those two labeled splits. Finally, you learned how to navigate the accuracy panel.

Feedback

Submit and view feedback for this page
Send feedback about H2O Document AI to cloud-feedback@h2o.ai

Prerequisites​

Step 1: Create your project, run OCR, and annotate your files​

Creating the project​

Running OCR​

Annotating your files​

Step 2: Splitting your labels annotation set​

Step 3: Applying labels to your split annotation sets​

Step 4: Training your evaluation model​

Step 5: Accessing your accuracy metrics​

Aggregate metrics​

Per class metrics​

Class confusion matrix​

Summary​