Skip to main content

Viewer API

H2O Document-ai Viewer API (1.0.0)

Download OpenAPI specification:Download

HTTP REST methods to use the H2O Document-ai for processing documents or listing available pipelines.

ArchiveService

DocumentService

EntityClassService

EntityService

MultipartUploadService

PageService

PageService_ListPages

Authorizations:
oidc
path Parameters
project
required
string
document
required
string
query Parameters
pageSize
integer <int32>

The default page size is 50

pageToken
string

Responses

Response samples

Content type
application/json
{
  • "pages": [],
  • "nextPageToken": "string",
  • "totalSize": 0
}

PipelineRevisionService

PipelineService

ProjectService

TableService

Update Table

Allows update the table

Server validations: - all Entities referenced in Table.rows or Table.unlocated exists in the document Entities, i.e. the entities are valid Entity.name of the document. - All EntityClass referenced in Table.columns exists in the document EntityClasses. - Table.columns contains no duplicate EntityClass, - Table.rows and Table.unlocated contains only Entities with EntityClasses that exists in the Table.columns, - Table.rows and Table.unlocated contains no duplicates, i.e. no two rows can have the same entity and the same entity cannot be in Table.unlocated and Table.rows at the same time. - Each row in Table.rows contains entities with distinct EntityClass, i.e. no two entities in the same row can have the same EntityClass. - All Entities referenced in one table are not referenced in other table, if this would happen after UpdateTable, the server removes the entity from the other table, this allows the client to relocate the Entity between tables atomically. - The original TableColumns cannot be truncated, user can remove the manually added columns but the columns inferred by the scorer can only be hidden. - The original TableRows cannot be truncated, user can remove the manually added rows but the rows inferred by the scorer can only be hidden.

Use cases: - to reorder the table columns - update the Table.columns - to add a new column - add and existing EntityClass to Table.columns. - to remove column - it is impossible to remove column, but it is possible to hide column by setting Column.hidden to true.

Authorizations:
oidc
path Parameters
project
required
string
document
required
string
table
required
string
Request Body schema: application/json
required

The table to update.

required
Array of objects (v1Column)

The list of Columns. The order is respected in the UI.

required
Array of objects (v1Row)

The list of Rows. The order is respected in the UI.

Responses

Request samples

Content type
application/json
{
  • "columns": [
    ],
  • "rows": [
    ]
}

Response samples

Content type
application/json
{
  • "table": {
    }
}

UserService

WorkspaceService

WorkspaceService_CleanWorkspace

Authorizations:
oidc
path Parameters
workspace
required
string
Request Body schema: application/json
required
object (WorkspaceServiceCleanWorkspaceBody)

Responses

Request samples

Content type
application/json
{ }

Response samples

Content type
application/json
{
  • "pipelines": []
}

Read

List Pipelines

Returns all pipelines available for processing documents.

Authorizations:
oidc

Responses

Response samples

Content type
application/json
{
  • "pipelines": [],
  • "totalSize": 0
}

List Pipeline Revisions

Returns all historic revisions of the pipeline. The order is from oldest to newest. The last resision in the list is the currently used one.

Authorizations:
oidc
path Parameters
pipeline
required
string

Responses

Response samples

Content type
application/json
{
  • "revisions": [
    ],
  • "totalSize": 0
}

List Projects

Returns a list of all projects.

Authorizations:
oidc
query Parameters
pageSize
integer <int32>

The default page size is 50. The maximum is 1000; values above 1000 will be coerced to 1000.

pageToken
string

A page token, received from a previous ListProject call. Provide this as to retrieve the subsequent page. When paginating, all other parameters provided to ListProject must match the call that provided the page token (e.g. the same filter must be used).

orderBy
string

We are following the AIP ordering syntax https://google.aip.dev/132#ordering The fields that support sorting are

  • project.display_name
  • project.update_time
  • project.pipeline.pipeline_name
  • project.documents_total_count Unless noted, all fields support sorting in both ascending and descending.
filter
string

We are following the AIP filtering syntax https://google.aip.dev/160 TODO: following might be descoped, we can do fuzzy search on the client in MVP Currently only search for literals is supported which performs a fuzzy search over the project.display_name, project.pipeline.pipeline_name and project.pipeline.pipeline_uri combined.

Responses

Response samples

Content type
application/json
{
  • "projects": [
    ],
  • "nextPageToken": "string",
  • "totalSize": 0
}

Get Project

Returns a single project.

Authorizations:
oidc
path Parameters
project
required
string

Responses

Response samples

Content type
application/json
{
  • "project": {
    }
}

List Archives

Returns all Archives. Note that this method also returns deleted archives (state=STATE_DELETED). You can use filter parameter to get archives in specific state.

Authorizations:
oidc
path Parameters
project
required
string
query Parameters
pageSize
integer <int32>

The default page size is 50

pageToken
string

A page token, received from a previous ListArchives call. Provide this as to retrieve the subsequent page. When paginating, all other parameters provided to ListDocuments must match the call that provided the page token (e.g. the same filter must be used).

orderBy
string

We are following the AIP ordering syntax https://google.aip.dev/132#ordering The fields that support sorting are

  • display_name
  • create_time
  • update_time Unless noted, all fields support sorting in both ascending and descending.
filter
string

We are following the AIP filtering syntax https://google.aip.dev/160 Currently only supports filtering by state

  • state="STATE_EXTRACTING" or state="STATE_WAITING_FOR_PASSWORD"

Responses

Response samples

Content type
application/json
{
  • "archives": [
    ],
  • "nextPageToken": "string",
  • "totalSize": 0
}

Get Archive

Returns a single Archive.

Authorizations:
oidc
path Parameters
project
required
string
archive
required
string

Responses

Response samples

Content type
application/json
{
  • "archive": {
    }
}

List Documents

List all documents in a project.

Authorizations:
oidc
path Parameters
project
required
string
query Parameters
pageSize
integer <int32>

The default page size is 50 The maximum is 1000; values above 1000 will be coerced to 1000.

pageToken
string

A page token, received from a previous ListDocuments call. Provide this as to retrieve the subsequent page. When paginating, all other parameters provided to ListDocuments must match the call that provided the page token (e.g. the same filter must be used).

orderBy
string

We are following the AIP ordering syntax https://google.aip.dev/132#ordering The fields that support sorting are

  • display_name
  • create_time
  • update_time
  • pipeline_run.results_count
  • pipeline_run.pages_count Unless noted, all fields support sorting in both ascending and descending.
filter
string

We are following the AIP filtering syntax https://google.aip.dev/160 Currently only supports filtering by pipeline name

  • pipeline_run.pipeline="pipelines/PIPELINE_ID"

Responses

Response samples

Content type
application/json
{
  • "documents": [
    ],
  • "nextPageToken": "string",
  • "totalSize": 0
}

Get Document

Gets the single document.

Authorizations:
oidc
path Parameters
project
required
string
document
required
string

Responses

Response samples

Content type
application/json
{
  • "document": {
    }
}

List Entities

Returns all Entities for a document.

Authorizations:
oidc
path Parameters
project
required
string
document
required
string
query Parameters
pageSize
integer <int32>

NOTE: the pagination is not supported on this resource.

pageToken
string

Responses

Response samples

Content type
application/json
{
  • "entities": [
    ],
  • "nextPageToken": "string",
  • "totalSize": 0
}

List Entity Classes

Returns all Entity Classes for a document.

Authorizations:
oidc
path Parameters
project
required
string
document
required
string
query Parameters
pageSize
integer <int32>

The default page size is 50

pageToken
string

Responses

Response samples

Content type
application/json
{
  • "entityClasses": [
    ],
  • "nextPageToken": "string",
  • "totalSize": 0
}

List Tables

Returns all tables inferred by the model in a document. Only works with a pipelines that have a post-processor for outputting tables.

Authorizations:
oidc
path Parameters
project
required
string
document
required
string

Responses

Response samples

Content type
application/json
{
  • "tables": [
    ]
}

Export Results

Exports the results of the document processing. The document must be processed before exporting. Also Sets the document.pipeline_run.export_time. The response contains a Content-Disposition header so when opened in a web browser, it starts a download.

Authorizations:
oidc
path Parameters
project
required
string
document
required
string
Request Body schema: application/json
required
format
required
string (- OUTPUT_TYPE_JSON: TODO: Not needed for MVP (please verify with PM))
Enum: "OUTPUT_TYPE_CSV" "OUTPUT_TYPE_JSON"

Responses

Request samples

Content type
application/json
{
  • "format": "OUTPUT_TYPE_CSV"
}

Response samples

Content type
application/json
"string"

List Users

Returns list of Users that has registered the email address specified in the request. It's the List method as the emails are not necessary unique across the users. It's up the client to decide how to handle cases when two users.

Authorizations:
oidc
query Parameters
pageSize
integer <int32>

Maximum number of items server should return in the response. When set to 0 server will decide how many items to return.

pageToken
string
filter
required
string

Currently supports only filtering by email

Responses

Response samples

Content type
application/json
{
  • "users": [
    ],
  • "nextPageToken": "string"
}

Get User

Returns a single user

Authorizations:
oidc
path Parameters
user
required
string

Responses

Response samples

Content type
application/json
{
  • "user": {
    }
}

Get Workspace

Returns a single Workspace.

Authorizations:
oidc
path Parameters
workspace
required
string

Responses

Response samples

Content type
application/json
{
  • "workspace": {
    }
}

Update

Rollback Pipeline Revision

Sets the pipeline revision as the current one, effectively rolling back the pipeline. Creates a new, last entry in the pipeline revisions list, thus revisions with rollback are duplicated in the list.

Authorizations:
oidc
path Parameters
pipeline
required
string
revision
required
string
Request Body schema: application/json
required
object (PipelineRevisionServiceRollbackPipelineRevisionBody)

Responses

Request samples

Content type
application/json
{ }

Response samples

Content type
application/json
{
  • "revision": {
    }
}

Update Project

Updates Project.

Authorizations:
oidc
path Parameters
project
required
string
Request Body schema: application/json
required
displayName
required
string
state
string (v1ProjectState)
Enum: "STATE_ACTIVE" "STATE_INCATIVE"
  • STATE_ACTIVE: The project is actively being used.
  • STATE_INCATIVE: The project was archived. Terminal state. The project settings cannot be changed and documents can no longer be imported.
pipeline
required
string
confidenceScoreThreshold
integer <int32>

The confidence score threshold. Results with ORC or Classification confidence score below this threshold will be marked as an error.

deleteDocumentsAfterExport
boolean

When set to true, the document content will be removed automatically after export. The metadata about the document will be left intact.

Responses

Request samples

Content type
application/json
{
  • "displayName": "string",
  • "state": "STATE_ACTIVE",
  • "pipeline": "string",
  • "confidenceScoreThreshold": 0,
  • "deleteDocumentsAfterExport": true
}

Response samples

Content type
application/json
{
  • "project": {
    }
}

Unlock Archive

Unlocks the password protected archive with a password provided in the request. Only archives in STATE_WAITING_FOR_PASSWORD can be unlocked.

Significant error codes: - FAILED_PRECONDITION - the archive is not in STATE_WAITING_FOR_PASSWORD state, - INVALID_ARGUMENT - the supplied password is incorrect.

The method returns the archive in changed state, which is - STATE_EXTRACTING if the archive was loaded into memory, - STATE_LOADING if the archive is not in the memory.

Authorizations:
oidc
path Parameters
project
required
string
archive
required
string
Request Body schema: application/json
required
password
required
string

Responses

Request samples

Content type
application/json
{
  • "password": "string"
}

Response samples

Content type
application/json
{
  • "archive": {
    }
}

Update Entity

Updates an entity with a user value.

Authorizations:
oidc
path Parameters
project
required
string
document
required
string
entitie
required
string
Request Body schema: application/json
required

The entity to be updated.

value
required
string

The value of the Entity. This is initially set by the OCR, but can be changed by the user.

entityClass
required
string

The Entity Class. The value must not be empty. This is initially set by the scorer.

object (v1ScoringData)

Scoring Data about the Entity.

Responses

Request samples

Content type
application/json
{
  • "value": "string",
  • "entityClass": "string",
  • "scoringData": {
    }
}

Response samples

Content type
application/json
{
  • "entity": {
    }
}

Process Document

Sends the document to processing, only valid for documents in IMPORTED state. Changes the document state to STATE_PROCESSING.

Authorizations:
oidc
path Parameters
project
required
string
document
required
string
Request Body schema: application/json
required
object (DocumentServiceProcessDocumentBody)

Responses

Request samples

Content type
application/json
{ }

Response samples

Content type
application/json
{
  • "document": {
    }
}

Create

Create Project

Creates a new Project.

Authorizations:
oidc
Request Body schema: application/json
required
displayName
required
string
state
string (v1ProjectState)
Enum: "STATE_ACTIVE" "STATE_INCATIVE"
  • STATE_ACTIVE: The project is actively being used.
  • STATE_INCATIVE: The project was archived. Terminal state. The project settings cannot be changed and documents can no longer be imported.
pipeline
required
string
confidenceScoreThreshold
integer <int32>

The confidence score threshold. Results with ORC or Classification confidence score below this threshold will be marked as an error.

deleteDocumentsAfterExport
boolean

When set to true, the document content will be removed automatically after export. The metadata about the document will be left intact.

Responses

Request samples

Content type
application/json
{
  • "displayName": "string",
  • "state": "STATE_ACTIVE",
  • "pipeline": "string",
  • "confidenceScoreThreshold": 0,
  • "deleteDocumentsAfterExport": true
}

Response samples

Content type
application/json
{
  • "project": {
    }
}

Create Entity

Creates a new entity. The entityClass must be set to entity classes valid for the document.

Authorizations:
oidc
path Parameters
project
required
string
document
required
string
Request Body schema: application/json
required

The entity to be created.

value
required
string

The value of the Entity. This is initially set by the OCR, but can be changed by the user.

entityClass
required
string

The Entity Class. The value must not be empty. This is initially set by the scorer.

object (v1ScoringData)

Scoring Data about the Entity.

Responses

Request samples

Content type
application/json
{
  • "value": "string",
  • "entityClass": "string",
  • "scoringData": {
    }
}

Response samples

Content type
application/json
{
  • "entity": {
    }
}

Create Multipart Upload

Initiates a multipart upload request. The response contains a URI to which the client should upload the file.

Authorizations:
oidc
path Parameters
project
required
string
Request Body schema: application/json
required
filename
string
pipeline
required
string (Scoring pipeline that should process the uploaded file)

Responses

Request samples

Content type
application/json
{
  • "filename": "string",
  • "pipeline": "string"
}

Response samples

Content type
application/json
{
  • "multipartUpload": {
    }
}

Add User to Workspace

Assigns a User with Workspace.

Authorizations:
oidc
path Parameters
workspace
required
string
Request Body schema: application/json
required
user
required
string

Responses

Request samples

Content type
application/json
{
  • "user": "string"
}

Response samples

Content type
application/json
{
  • "workspace": {
    }
}

Delete

Delete Project

Deletes a specified Project. TBD.

Authorizations:
oidc
path Parameters
project
required
string

Responses

Response samples

Content type
application/json
{ }

Delete Archive

Deletes the archive. Archive metadata will be kept, artifacts related to it will be deleted entirely.

Currently, only archives in following states can be deleted

- STATE_WAITING_FOR_PASSWORD
- STATE_EXTRACTING_FAILED
Authorizations:
oidc
path Parameters
project
required
string
archive
required
string

Responses

Response samples

Content type
application/json
{ }

Delete Document

Deletes the document. Document metadata will be kept, artifacts that are related to the document will be deleted entirely.

Authorizations:
oidc
path Parameters
project
required
string
document
required
string

Responses

Response samples

Content type
application/json
{ }

Delete Entity

A method that deletes an entity. Only manually created entities can be deleted. Hence, if the entity has a scoring_data field, the operation is not permitted.

Authorizations:
oidc
path Parameters
project
required
string
document
required
string
entitie
required
string

Responses

Response samples

Content type
application/json
{ }

Remove User from Workspace

Unassigns a User from the Workspace.

Authorizations:
oidc
path Parameters
workspace
required
string
Request Body schema: application/json
required
user
required
string

Responses

Request samples

Content type
application/json
{
  • "user": "string"
}

Response samples

Content type
application/json
{
  • "workspace": {
    }
}

Change

Mark Document as Reviewed

Changes the document state to REVIEWED.

Authorizations:
oidc
path Parameters
project
required
string
document
required
string
Request Body schema: application/json
required
object (DocumentServiceReviewDocumentBody)

Responses

Request samples

Content type
application/json
{ }

Response samples

Content type
application/json
{
  • "document": {
    }
}

Mark Document as Reviewing

Changes the document state to REVIEWING.

Authorizations:
oidc
path Parameters
project
required
string
document
required
string
Request Body schema: application/json
required
object (DocumentServiceStartReviewingBody)

Responses

Request samples

Content type
application/json
{ }

Response samples

Content type
application/json
{
  • "document": {
    }
}

Unreview Document

Reverts the review of the document. The REVIEWED state is changed to REVIEWING. Works only for documents in REVIEWED state.

Authorizations:
oidc
path Parameters
project
required
string
document
required
string
Request Body schema: application/json
required
object (DocumentServiceUnreviewDocumentBody)

Responses

Request samples

Content type
application/json
{ }

Response samples

Content type
application/json
{
  • "document": {
    }
}

Complete Multipart Upload

Completes the multipart upload, must be called by the client after all the parts of the file are uploaded.

Authorizations:
oidc
path Parameters
project
required
string
multipartUpload
required
string
Request Body schema: application/json
required
object (MultipartUploadServiceMultipartUploadCompleteBody)

Responses

Request samples

Content type
application/json
{ }

Response samples

Content type
application/json
{
  • "multipartUpload": {
    }
}

Upload

Uploads a file via Multipart Upload

Uploads one part of the multipart upload. The part number must be specified in the Part-Number header. The optimal size of the part is ~10MB.

Authorizations:
oidc
path Parameters
project
required
string
multipartUpload
required
string
header Parameters
Part-Number
required
integer

Responses

Response samples

Content type
application/json
{ }

Feedback