Skip to main content
Version: v1.7.0

Extractors

Overview​

Extractors convert unstructured document content into structured JSON data. Define a JSON schema that specifies the fields and data types you want to capture, then apply the extractor to documents such as CVs, invoices, Form 10-Ks, or scanned images to retrieve the information in a consistent format.

Access the Extractors page​

To access the Extractors page, in the Enterprise h2oGPTe navigation menu, click Extractors.

The Extractors page provides a centralized location to create, view, and manage all extractors. The following numbered elements are available on the page:

Extractors page with labels

ElementDescription
[1] + New extractorOpens the extractor creation form.
[2] Filter extractorsSearch extractors by name or description.
[3] All extractorsShows all extractors visible to you, including ones you created and ones shared with you.
[4] My extractorsShows only extractors you created.
[5] SharedShows extractors shared with you.
[6] Extractors tableLists extractors for the selected tab. Columns: Name (click to view details), Id (click the copy icon to copy), Description, Created, Actions.

Create an extractor​

To create an extractor:

  1. Access the Extractors page. For more information, see Access the Extractors page.

  2. Click + New extractor (1). New extractor page

  3. In the Extractor name field, enter a name for the extractor.

  4. (Optional) In the Description field, enter a description for the extractor.

  5. In the LLM list, select a large language model (LLM) for the extraction task. The list displays model names with estimated processing costs.

  6. (Optional) Configure additional extractor settings such as vision, prompts, chunk limits, and guardrails. For details, see Extractor settings.

  7. Define the JSON schema for the extractor with one of the following two options:

    note

    The JSON schema does not require exact label names to align perfectly with document fields, as the collection's large language model (LLM) can interpret and infer label purposes based on context. This allows the model to understand and map various label names, even if there are minor differences in terminology, to their intended data points. The LLM uses its interpretive capability to match schema labels with relevant content, even when exact terms differ, similar to how a human deduces what a field captures.

    You can build the JSON schema for the extractor using the JSON schema builder.

    1. In the Input JSON Schema section, ensure the toggle is off (builder mode is active).
    2. Click Add Field or Add Table to add fields or tables to your schema.
    3. For each field, define the following:
      • Field Name: Enter the name of the field.
      • Type: Select the data type (for example, String, Number, Boolean).
      • Required: Toggle this option if the field is required.
    4. Click Add Field to add additional fields as needed.

    For example:

    JSON schema builder

  8. (Optional) In the Current Labels section, add labels to organize and categorize the extractor.

  9. Click + Create.

Edit and manage extractors​

You can manage extractors using the Actions menu in the Extractors table. Click the vertical ellipsis icon (â‹®) in the Actions column for any extractor to access the following options:

Extractor actions menu

  • Run extractor: Execute the extractor on a selected collection.
  • Share Extractor: Manage access permissions for the extractor.
  • Delete: Permanently remove the extractor.

Run extractor​

To run an extractor on a collection:

  1. In the Extractors table, click the vertical ellipsis icon (â‹®) in the Actions column for the extractor you want to run.
  2. Click Run extractor.
  3. In the Run dialog, select a collection from the Select a collection dropdown. Run extractor dialog
  4. Click Run to execute the extractor on the selected collection.

Share extractor​

To share an extractor and manage access permissions:

  1. In the Extractors table, click the vertical ellipsis icon (â‹®) in the Actions column for the extractor you want to share.
  2. Click Share Extractor.
  3. In the Share dialog, manage access permissions using the following tabs: Share Extractor dialog
    • Public: Make the extractor publicly accessible to all users.
      • Toggle Make Extractor public to enable or disable public access.
    • Users: Grant specific users access to the extractor.
    • IDP Groups: Grant access to identity provider (IDP) groups.
  4. Click Save to apply your changes, or click Cancel to discard them.

Delete extractor​

To delete an extractor:

  1. In the Extractors table, click the vertical ellipsis icon (â‹®) in the Actions column for the extractor you want to delete.
  2. Click Delete.
  3. In the confirmation dialog, review the warning message. Delete extractor confirmation
    warning

    Deleting an extractor is a permanent action that cannot be undone. Ensure you no longer need the extractor before proceeding.

  4. Click Delete to confirm the deletion, or click Cancel to abort.

Feedback