Viewer API
H2O Document-ai Viewer API (1.0.0)
Download OpenAPI specification:Download
HTTP REST methods to use the H2O Document-ai for processing documents or listing available pipelines.
PageService_ListPages
Authorizations:
path Parameters
project required | string |
document required | string |
query Parameters
pageSize | integer <int32> The default page size is 50 |
pageToken | string |
Responses
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{- "nextPageToken": "string",
- "totalSize": 0
}
Update Table
Allows update the table
Server validations: - all Entities referenced in Table.rows or Table.unlocated exists in the document Entities, i.e. the entities are valid Entity.name of the document. - All EntityClass referenced in Table.columns exists in the document EntityClasses. - Table.columns contains no duplicate EntityClass, - Table.rows and Table.unlocated contains only Entities with EntityClasses that exists in the Table.columns, - Table.rows and Table.unlocated contains no duplicates, i.e. no two rows can have the same entity and the same entity cannot be in Table.unlocated and Table.rows at the same time. - Each row in Table.rows contains entities with distinct EntityClass, i.e. no two entities in the same row can have the same EntityClass. - All Entities referenced in one table are not referenced in other table, if this would happen after UpdateTable, the server removes the entity from the other table, this allows the client to relocate the Entity between tables atomically. - The original TableColumns cannot be truncated, user can remove the manually added columns but the columns inferred by the scorer can only be hidden. - The original TableRows cannot be truncated, user can remove the manually added rows but the rows inferred by the scorer can only be hidden.
Use cases:
- to reorder the table columns - update the Table.columns
- to add a new column - add and existing EntityClass to Table.columns.
- to remove column - it is impossible to remove column, but it is possible to hide column by setting Column.hidden
to true.
Authorizations:
path Parameters
project required | string |
document required | string |
table required | string |
Request Body schema: application/jsonrequired
The table to update.
required | Array of objects (v1Column) The list of Columns. The order is respected in the UI. |
required | Array of objects (v1Row) The list of Rows. The order is respected in the UI. |
Responses
Request samples
- Payload
{- "columns": [
- {
- "entityClass": "string",
- "hidden": true
}
], - "rows": [
- {
- "entities": [
- "string"
], - "hidden": true
}
]
}
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{- "table": {
- "name": "string",
- "columns": [
- {
- "entityClass": "string",
- "hidden": true
}
], - "rows": [
- {
- "entities": [
- "string"
], - "hidden": true
}
]
}
}
WorkspaceService_CleanWorkspace
Authorizations:
path Parameters
workspace required | string |
Request Body schema: application/jsonrequired
Responses
Request samples
- Payload
{ }
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{
}
List Pipeline Revisions
Returns all historic revisions of the pipeline. The order is from oldest to newest. The last resision in the list is the currently used one.
Authorizations:
path Parameters
pipeline required | string |
Responses
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{- "revisions": [
- {
- "version": 0,
- "name": "string",
- "namespace": "string",
- "description": "string",
- "status": "string",
- "firstDeployedTime": "2019-08-24T14:15:22Z",
- "lastDeployedTime": "2019-08-24T14:15:22Z",
- "deletedTime": "2019-08-24T14:15:22Z",
- "chart": "string",
- "appVersion": "string"
}
], - "totalSize": 0
}
List Projects
Returns a list of all projects.
Authorizations:
query Parameters
pageSize | integer <int32> The default page size is 50. The maximum is 1000; values above 1000 will be coerced to 1000. |
pageToken | string A page token, received from a previous |
orderBy | string We are following the AIP ordering syntax https://google.aip.dev/132#ordering The fields that support sorting are
|
filter | string We are following the AIP filtering syntax https://google.aip.dev/160 TODO: following might be descoped, we can do fuzzy search on the client in MVP Currently only search for literals is supported which performs a fuzzy search over the project.display_name, project.pipeline.pipeline_name and project.pipeline.pipeline_uri combined. |
Responses
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{- "projects": [
- {
- "name": "string",
- "displayName": "string",
- "createTime": "2019-08-24T14:15:22Z",
- "updateTime": "2019-08-24T14:15:22Z",
- "state": "STATE_ACTIVE",
- "pipeline": "string",
- "documentsTotalCount": 0,
- "documentsReadyForReviewCount": 0,
- "documentsNotProcessedCount": 0,
- "confidenceScoreThreshold": 0,
- "deleteDocumentsAfterExport": true
}
], - "nextPageToken": "string",
- "totalSize": 0
}
Get Project
Returns a single project.
Authorizations:
path Parameters
project required | string |
Responses
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{- "project": {
- "name": "string",
- "displayName": "string",
- "createTime": "2019-08-24T14:15:22Z",
- "updateTime": "2019-08-24T14:15:22Z",
- "state": "STATE_ACTIVE",
- "pipeline": "string",
- "documentsTotalCount": 0,
- "documentsReadyForReviewCount": 0,
- "documentsNotProcessedCount": 0,
- "confidenceScoreThreshold": 0,
- "deleteDocumentsAfterExport": true
}
}
List Archives
Returns all Archives. Note that this method also returns deleted archives (state=STATE_DELETED). You can use filter parameter to get archives in specific state.
Authorizations:
path Parameters
project required | string |
query Parameters
pageSize | integer <int32> The default page size is 50 |
pageToken | string A page token, received from a previous |
orderBy | string We are following the AIP ordering syntax https://google.aip.dev/132#ordering The fields that support sorting are
|
filter | string We are following the AIP filtering syntax https://google.aip.dev/160 Currently only supports filtering by state
|
Responses
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{- "archives": [
- {
- "name": "string",
- "displayName": "string",
- "createTime": "2019-08-24T14:15:22Z",
- "updateTime": "2019-08-24T14:15:22Z",
- "pipeline": "string",
- "state": "STATE_IMPORTING",
- "importErrorMessage": "string",
- "extractingErrorMessage": "string"
}
], - "nextPageToken": "string",
- "totalSize": 0
}
Get Archive
Returns a single Archive.
Authorizations:
path Parameters
project required | string |
archive required | string |
Responses
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{- "archive": {
- "name": "string",
- "displayName": "string",
- "createTime": "2019-08-24T14:15:22Z",
- "updateTime": "2019-08-24T14:15:22Z",
- "pipeline": "string",
- "state": "STATE_IMPORTING",
- "importErrorMessage": "string",
- "extractingErrorMessage": "string"
}
}
List Documents
List all documents in a project.
Authorizations:
path Parameters
project required | string |
query Parameters
pageSize | integer <int32> The default page size is 50 The maximum is 1000; values above 1000 will be coerced to 1000. |
pageToken | string A page token, received from a previous |
orderBy | string We are following the AIP ordering syntax https://google.aip.dev/132#ordering The fields that support sorting are
|
filter | string We are following the AIP filtering syntax https://google.aip.dev/160 Currently only supports filtering by pipeline name
|
Responses
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{- "documents": [
- {
- "name": "string",
- "displayName": "string",
- "createTime": "2019-08-24T14:15:22Z",
- "updateTime": "2019-08-24T14:15:22Z",
- "pipeline": "string",
- "state": "STATE_IMPORTING",
- "pipelineRun": {
- "resultsCount": 0,
- "resultsFlaggedCount": 0,
- "resultsMissingCount": 0,
- "pagesCount": 0,
- "errorThreshold": 0,
- "exportTime": "2019-08-24T14:15:22Z"
}, - "importErrorMessage": "string",
- "processingErrorMessage": "string"
}
], - "nextPageToken": "string",
- "totalSize": 0
}
Get Document
Gets the single document.
Authorizations:
path Parameters
project required | string |
document required | string |
Responses
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{- "document": {
- "name": "string",
- "displayName": "string",
- "createTime": "2019-08-24T14:15:22Z",
- "updateTime": "2019-08-24T14:15:22Z",
- "pipeline": "string",
- "state": "STATE_IMPORTING",
- "pipelineRun": {
- "resultsCount": 0,
- "resultsFlaggedCount": 0,
- "resultsMissingCount": 0,
- "pagesCount": 0,
- "errorThreshold": 0,
- "exportTime": "2019-08-24T14:15:22Z"
}, - "importErrorMessage": "string",
- "processingErrorMessage": "string"
}
}
List Entities
Returns all Entities for a document.
Authorizations:
path Parameters
project required | string |
document required | string |
query Parameters
pageSize | integer <int32> NOTE: the pagination is not supported on this resource. |
pageToken | string |
Responses
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{- "entities": [
- {
- "name": "string",
- "value": "string",
- "entityClass": "string",
- "scoringData": {
- "value": "string",
- "confidenceOcr": 0,
- "confidenceClassification": 0,
- "page": 0,
- "boundingBox": {
- "x": 0,
- "y": 0,
- "width": 0,
- "height": 0
}
}
}
], - "nextPageToken": "string",
- "totalSize": 0
}
List Entity Classes
Returns all Entity Classes for a document.
Authorizations:
path Parameters
project required | string |
document required | string |
query Parameters
pageSize | integer <int32> The default page size is 50 |
pageToken | string |
Responses
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{- "entityClasses": [
- {
- "name": "string",
- "displayName": "string"
}
], - "nextPageToken": "string",
- "totalSize": 0
}
List Tables
Returns all tables inferred by the model in a document. Only works with a pipelines that have a post-processor for outputting tables.
Authorizations:
path Parameters
project required | string |
document required | string |
Responses
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{- "tables": [
- {
- "name": "string",
- "columns": [
- {
- "entityClass": "string",
- "hidden": true
}
], - "rows": [
- {
- "entities": [
- "string"
], - "hidden": true
}
]
}
]
}
Export Results
Exports the results of the document processing. The document must be processed before exporting. Also Sets the document.pipeline_run.export_time
. The response contains a Content-Disposition header so when opened in a web browser, it starts a download.
Authorizations:
path Parameters
project required | string |
document required | string |
Request Body schema: application/jsonrequired
format required | string (- OUTPUT_TYPE_JSON: TODO: Not needed for MVP (please verify with PM)) Enum: "OUTPUT_TYPE_CSV" "OUTPUT_TYPE_JSON" |
Responses
Request samples
- Payload
{- "format": "OUTPUT_TYPE_CSV"
}
Response samples
- 200
- 400
- 401
- 403
- 404
- default
"string"
List Users
Returns list of Users that has registered the email address specified in the request. It's the List method as the emails are not necessary unique across the users. It's up the client to decide how to handle cases when two users.
Authorizations:
query Parameters
pageSize | integer <int32> Maximum number of items server should return in the response. When set to 0 server will decide how many items to return. |
pageToken | string |
filter required | string Currently supports only filtering by email
|
Responses
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{- "users": [
- {
- "name": "string",
- "emails": [
- "string"
]
}
], - "nextPageToken": "string"
}
Rollback Pipeline Revision
Sets the pipeline revision as the current one, effectively rolling back the pipeline. Creates a new, last entry in the pipeline revisions list, thus revisions with rollback are duplicated in the list.
Authorizations:
path Parameters
pipeline required | string |
revision required | string |
Request Body schema: application/jsonrequired
Responses
Request samples
- Payload
{ }
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{- "revision": {
- "version": 0,
- "name": "string",
- "namespace": "string",
- "description": "string",
- "status": "string",
- "firstDeployedTime": "2019-08-24T14:15:22Z",
- "lastDeployedTime": "2019-08-24T14:15:22Z",
- "deletedTime": "2019-08-24T14:15:22Z",
- "chart": "string",
- "appVersion": "string"
}
}
Update Project
Updates Project.
Authorizations:
path Parameters
project required | string |
Request Body schema: application/jsonrequired
displayName required | string |
state | string (v1ProjectState) Enum: "STATE_ACTIVE" "STATE_INCATIVE"
|
pipeline required | string |
confidenceScoreThreshold | integer <int32> The confidence score threshold. Results with ORC or Classification confidence score below this threshold will be marked as an error. |
deleteDocumentsAfterExport | boolean When set to true, the document content will be removed automatically after export. The metadata about the document will be left intact. |
Responses
Request samples
- Payload
{- "displayName": "string",
- "state": "STATE_ACTIVE",
- "pipeline": "string",
- "confidenceScoreThreshold": 0,
- "deleteDocumentsAfterExport": true
}
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{- "project": {
- "name": "string",
- "displayName": "string",
- "createTime": "2019-08-24T14:15:22Z",
- "updateTime": "2019-08-24T14:15:22Z",
- "state": "STATE_ACTIVE",
- "pipeline": "string",
- "documentsTotalCount": 0,
- "documentsReadyForReviewCount": 0,
- "documentsNotProcessedCount": 0,
- "confidenceScoreThreshold": 0,
- "deleteDocumentsAfterExport": true
}
}
Unlock Archive
Unlocks the password protected archive with a password provided in the request. Only archives in STATE_WAITING_FOR_PASSWORD can be unlocked.
Significant error codes: - FAILED_PRECONDITION - the archive is not in STATE_WAITING_FOR_PASSWORD state, - INVALID_ARGUMENT - the supplied password is incorrect.
The method returns the archive in changed state, which is - STATE_EXTRACTING if the archive was loaded into memory, - STATE_LOADING if the archive is not in the memory.
Authorizations:
path Parameters
project required | string |
archive required | string |
Request Body schema: application/jsonrequired
password required | string |
Responses
Request samples
- Payload
{- "password": "string"
}
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{- "archive": {
- "name": "string",
- "displayName": "string",
- "createTime": "2019-08-24T14:15:22Z",
- "updateTime": "2019-08-24T14:15:22Z",
- "pipeline": "string",
- "state": "STATE_IMPORTING",
- "importErrorMessage": "string",
- "extractingErrorMessage": "string"
}
}
Update Entity
Updates an entity with a user value.
Authorizations:
path Parameters
project required | string |
document required | string |
entitie required | string |
Request Body schema: application/jsonrequired
The entity to be updated.
value required | string The value of the Entity. This is initially set by the OCR, but can be changed by the user. |
entityClass required | string The Entity Class. The value must not be empty. This is initially set by the scorer. |
object (v1ScoringData) Scoring Data about the Entity. |
Responses
Request samples
- Payload
{- "value": "string",
- "entityClass": "string",
- "scoringData": {
- "value": "string",
- "confidenceOcr": 0,
- "confidenceClassification": 0,
- "page": 0,
- "boundingBox": {
- "x": 0,
- "y": 0,
- "width": 0,
- "height": 0
}
}
}
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{- "entity": {
- "name": "string",
- "value": "string",
- "entityClass": "string",
- "scoringData": {
- "value": "string",
- "confidenceOcr": 0,
- "confidenceClassification": 0,
- "page": 0,
- "boundingBox": {
- "x": 0,
- "y": 0,
- "width": 0,
- "height": 0
}
}
}
}
Process Document
Sends the document to processing, only valid for documents in IMPORTED state. Changes the document state to STATE_PROCESSING.
Authorizations:
path Parameters
project required | string |
document required | string |
Request Body schema: application/jsonrequired
Responses
Request samples
- Payload
{ }
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{- "document": {
- "name": "string",
- "displayName": "string",
- "createTime": "2019-08-24T14:15:22Z",
- "updateTime": "2019-08-24T14:15:22Z",
- "pipeline": "string",
- "state": "STATE_IMPORTING",
- "pipelineRun": {
- "resultsCount": 0,
- "resultsFlaggedCount": 0,
- "resultsMissingCount": 0,
- "pagesCount": 0,
- "errorThreshold": 0,
- "exportTime": "2019-08-24T14:15:22Z"
}, - "importErrorMessage": "string",
- "processingErrorMessage": "string"
}
}
Create Project
Creates a new Project.
Authorizations:
Request Body schema: application/jsonrequired
displayName required | string |
state | string (v1ProjectState) Enum: "STATE_ACTIVE" "STATE_INCATIVE"
|
pipeline required | string |
confidenceScoreThreshold | integer <int32> The confidence score threshold. Results with ORC or Classification confidence score below this threshold will be marked as an error. |
deleteDocumentsAfterExport | boolean When set to true, the document content will be removed automatically after export. The metadata about the document will be left intact. |
Responses
Request samples
- Payload
{- "displayName": "string",
- "state": "STATE_ACTIVE",
- "pipeline": "string",
- "confidenceScoreThreshold": 0,
- "deleteDocumentsAfterExport": true
}
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{- "project": {
- "name": "string",
- "displayName": "string",
- "createTime": "2019-08-24T14:15:22Z",
- "updateTime": "2019-08-24T14:15:22Z",
- "state": "STATE_ACTIVE",
- "pipeline": "string",
- "documentsTotalCount": 0,
- "documentsReadyForReviewCount": 0,
- "documentsNotProcessedCount": 0,
- "confidenceScoreThreshold": 0,
- "deleteDocumentsAfterExport": true
}
}
Create Entity
Creates a new entity. The entityClass must be set to entity classes valid for the document.
Authorizations:
path Parameters
project required | string |
document required | string |
Request Body schema: application/jsonrequired
The entity to be created.
value required | string The value of the Entity. This is initially set by the OCR, but can be changed by the user. |
entityClass required | string The Entity Class. The value must not be empty. This is initially set by the scorer. |
object (v1ScoringData) Scoring Data about the Entity. |
Responses
Request samples
- Payload
{- "value": "string",
- "entityClass": "string",
- "scoringData": {
- "value": "string",
- "confidenceOcr": 0,
- "confidenceClassification": 0,
- "page": 0,
- "boundingBox": {
- "x": 0,
- "y": 0,
- "width": 0,
- "height": 0
}
}
}
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{- "entity": {
- "name": "string",
- "value": "string",
- "entityClass": "string",
- "scoringData": {
- "value": "string",
- "confidenceOcr": 0,
- "confidenceClassification": 0,
- "page": 0,
- "boundingBox": {
- "x": 0,
- "y": 0,
- "width": 0,
- "height": 0
}
}
}
}
Create Multipart Upload
Initiates a multipart upload request. The response contains a URI to which the client should upload the file.
Authorizations:
path Parameters
project required | string |
Request Body schema: application/jsonrequired
filename | string |
pipeline required | string (Scoring pipeline that should process the uploaded file) |
Responses
Request samples
- Payload
{- "filename": "string",
- "pipeline": "string"
}
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{- "multipartUpload": {
- "name": "string",
- "createTime": "2019-08-24T14:15:22Z",
- "document": "string",
- "archive": "string"
}
}
Add User to Workspace
Assigns a User with Workspace.
Authorizations:
path Parameters
workspace required | string |
Request Body schema: application/jsonrequired
user required | string |
Responses
Request samples
- Payload
{- "user": "string"
}
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{- "workspace": {
- "name": "string",
- "roleBindings": [
- {
- "role": "ROLE_OWNER",
- "user": "string"
}
]
}
}
Delete Archive
Deletes the archive. Archive metadata will be kept, artifacts related to it will be deleted entirely.
Currently, only archives in following states can be deleted
- STATE_WAITING_FOR_PASSWORD
- STATE_EXTRACTING_FAILED
Authorizations:
path Parameters
project required | string |
archive required | string |
Responses
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{ }
Delete Entity
A method that deletes an entity. Only manually created entities can be deleted. Hence, if the entity has a scoring_data
field, the operation is not permitted.
Authorizations:
path Parameters
project required | string |
document required | string |
entitie required | string |
Responses
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{ }
Remove User from Workspace
Unassigns a User from the Workspace.
Authorizations:
path Parameters
workspace required | string |
Request Body schema: application/jsonrequired
user required | string |
Responses
Request samples
- Payload
{- "user": "string"
}
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{- "workspace": {
- "name": "string",
- "roleBindings": [
- {
- "role": "ROLE_OWNER",
- "user": "string"
}
]
}
}
Mark Document as Reviewed
Changes the document state to REVIEWED.
Authorizations:
path Parameters
project required | string |
document required | string |
Request Body schema: application/jsonrequired
Responses
Request samples
- Payload
{ }
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{- "document": {
- "name": "string",
- "displayName": "string",
- "createTime": "2019-08-24T14:15:22Z",
- "updateTime": "2019-08-24T14:15:22Z",
- "pipeline": "string",
- "state": "STATE_IMPORTING",
- "pipelineRun": {
- "resultsCount": 0,
- "resultsFlaggedCount": 0,
- "resultsMissingCount": 0,
- "pagesCount": 0,
- "errorThreshold": 0,
- "exportTime": "2019-08-24T14:15:22Z"
}, - "importErrorMessage": "string",
- "processingErrorMessage": "string"
}
}
Mark Document as Reviewing
Changes the document state to REVIEWING.
Authorizations:
path Parameters
project required | string |
document required | string |
Request Body schema: application/jsonrequired
Responses
Request samples
- Payload
{ }
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{- "document": {
- "name": "string",
- "displayName": "string",
- "createTime": "2019-08-24T14:15:22Z",
- "updateTime": "2019-08-24T14:15:22Z",
- "pipeline": "string",
- "state": "STATE_IMPORTING",
- "pipelineRun": {
- "resultsCount": 0,
- "resultsFlaggedCount": 0,
- "resultsMissingCount": 0,
- "pagesCount": 0,
- "errorThreshold": 0,
- "exportTime": "2019-08-24T14:15:22Z"
}, - "importErrorMessage": "string",
- "processingErrorMessage": "string"
}
}
Unreview Document
Reverts the review of the document. The REVIEWED state is changed to REVIEWING. Works only for documents in REVIEWED state.
Authorizations:
path Parameters
project required | string |
document required | string |
Request Body schema: application/jsonrequired
Responses
Request samples
- Payload
{ }
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{- "document": {
- "name": "string",
- "displayName": "string",
- "createTime": "2019-08-24T14:15:22Z",
- "updateTime": "2019-08-24T14:15:22Z",
- "pipeline": "string",
- "state": "STATE_IMPORTING",
- "pipelineRun": {
- "resultsCount": 0,
- "resultsFlaggedCount": 0,
- "resultsMissingCount": 0,
- "pagesCount": 0,
- "errorThreshold": 0,
- "exportTime": "2019-08-24T14:15:22Z"
}, - "importErrorMessage": "string",
- "processingErrorMessage": "string"
}
}
Complete Multipart Upload
Completes the multipart upload, must be called by the client after all the parts of the file are uploaded.
Authorizations:
path Parameters
project required | string |
multipartUpload required | string |
Request Body schema: application/jsonrequired
Responses
Request samples
- Payload
{ }
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{- "multipartUpload": {
- "name": "string",
- "createTime": "2019-08-24T14:15:22Z",
- "document": "string",
- "archive": "string"
}
}
Uploads a file via Multipart Upload
Uploads one part of the multipart upload. The part number must be specified in the Part-Number header. The optimal size of the part is ~10MB.
Authorizations:
path Parameters
project required | string |
multipartUpload required | string |
header Parameters
Part-Number required | integer |
Responses
Response samples
- 200
- 400
- 401
- 403
- 404
- default
{ }
- Submit and view feedback for this page
- Send feedback about H2O Document AI to cloud-feedback@h2o.ai