Extractors
Overview​
Extractors, defined by JSON schemas, play an important role in document AI by converting unstructured document content into structured, actionable data. They allow users to retrieve information from various document types, such as CVs, invoices, Form 10-Ks, or scanned images, without requiring complex setups or extensive annotations.
To use an Extractor, first identify the specific information you want to extract from a document. This information is specified in a JSON schema, which is part of an Extractor and acts as a blueprint for the data, detailing the fields and data types you wish to capture. Once you define this schema, you can apply the Extractor to the document, retrieving the desired information in a structured JSON format. For example, this structured data can be useful for individuals and applications that require organized information.
Access the Extractors page​
To access the Extractors page, in the Enterprise h2oGPTe navigation menu, click Extractors.
The Extractors page provides a centralized location to create, view, and manage all extractors. The following numbered elements are available on the page:

| Element | Description |
|---|---|
| [1] + New extractor | Click this button to create a new extractor. This opens the extractor creation form where you can define the extractor name, description, LLM, and JSON schema. |
| [2] Filter extractors | Use this search field to search for specific extractors by name or description. Enter keywords to filter the extractor list in real time. |
| [3] All extractors | Click this tab to view all extractors visible to you, including extractors you created and extractors shared with you. |
| [4] My extractors | Click this tab to view only the extractors you created. This helps you quickly access your own extractors. |
| [5] Shared | Click this tab to view extractors that have been shared with you by other users. This allows you to access extractors created by your team members. |
| [6] Extractors table | This table displays all extractors based on the selected tab. The table includes the following columns: • Name: The name of the extractor. Click the name to view extractor details and results. • Id: A unique identifier for the extractor. Click the copy icon next to the ID to copy it to your clipboard. • Description: A brief description of the extractor, if provided. • Created: The date or time when the extractor was created. • Actions: Click the vertical ellipsis icon to access additional actions for the extractor, such as edit, delete, or share options. |
Create an Extractor​
To create an Extractor, consider the following steps:
-
Access the Extractors page. For more information, see Access the Extractors page.
-
Click + New extractor (1).

-
In the Extractor name field, enter a name for the extractor.
-
(Optional) In the Description field, enter a description for the extractor.
-
In the LLM list, select a large language model (LLM) to process the extraction process. The list displays model names with associated costs per token for processing.
-
Define the JSON schema for the extractor with one of the following two options:
noteThe JSON schema does not require exact label names to align perfectly with document fields, as the collection's large language model (LLM) can interpret and infer label purposes based on context. This allows the model to understand and map various label names, even if there are minor differences in terminology, to their intended data points. Just as a human might deduce what a field intends to capture, the LLM uses its interpretive capability to accurately match schema labels with relevant content, even when exact terms differ.
- Option 1 (JSON schema builder)
- Option 2 (JSON schema code)
You can build the JSON schema for the Extractor using the JSON Schema builder. Consider the following steps:
- In the Input JSON Schema section, ensure the toggle is off (builder mode is active).
- Click Add Field or Add Table to add fields or tables to your schema.
- For each field, define the following:
- Field Name: Enter the name of the field.
- Type: Select the data type (for example, String, Number, Boolean).
- Required: Toggle this option if the field is required.
- Click Add Field to add additional fields as needed.
For example:

You can define the actual JSON schema code for the Extractor. Consider the following steps to do so:
-
In the Input JSON Schema section, click the toggle to enable code mode.
-
In the JSON schema box, enter a valid JSON schema. For example:
{
"type":"object",
"properties":{
"revenueGrowthRate":{
"type":"number",
"description":"The growth rate of revenue."
},
"netProfitMargin":{
"type":"number",
"description":"The company's profit margin."
},
"currentRatio":{
"type":"number",
"description":"The company's liquidity position."
},
"returnOnEquity":{
"type":"number",
"description":"The efficiency in generating profit from equity."
},
"debtToEquityRatio":{
"type":"number",
"description":"The proportion of debt to shareholders' equity."
}
},
"required":[
"revenueGrowthRate",
"netProfitMargin",
"currentRatio",
"returnOnEquity",
"debtToEquityRatio"
]
}
-
(Optional) In the Current Labels section, add labels to organize and categorize the extractor.
-
Click + Create.
Edit and manage Extractors​
You can manage extractors using the Actions menu in the Extractors table. Click the vertical ellipsis icon (â‹®) in the Actions column for any extractor to access the following options:

- Run extractor: Execute the extractor on a selected collection.
- Share Extractor: Manage access permissions for the extractor.
- Delete: Permanently remove the extractor.
Run extractor​
To run an extractor on a collection, consider the following steps:
- In the Extractors table, click the vertical ellipsis icon (â‹®) in the Actions column for the extractor you want to run.
- Click Run extractor.
- In the Run dialog, select a collection from the Select a collection dropdown.

- Click Run to execute the extractor on the selected collection.
Share Extractor​
To share an extractor and manage access permissions, consider the following steps:
- In the Extractors table, click the vertical ellipsis icon (â‹®) in the Actions column for the extractor you want to share.
- Click Share Extractor.
- In the Share dialog, manage access permissions using the following tabs:
- Public: Make the extractor publicly accessible to all users.
- Toggle Make Extractor public to enable or disable public access.
- Users: Grant specific users access to the extractor.
- IDP Groups: Grant access to identity provider (IDP) groups.
- Public: Make the extractor publicly accessible to all users.
- Click Save to apply your changes, or click Cancel to discard them.
Delete extractor​
To delete an extractor, consider the following steps:
- In the Extractors table, click the vertical ellipsis icon (â‹®) in the Actions column for the extractor you want to delete.
- Click Delete.
- In the confirmation dialog, review the warning message.
warningDeleting an extractor is a permanent action that cannot be undone. Ensure you no longer need the extractor before proceeding.
- Click Delete to confirm the deletion, or click Cancel to abort.
- Submit and view feedback for this page
- Send feedback about Enterprise h2oGPTe to cloud-feedback@h2o.ai