Release notes

v0.12.1 (August 4, 2025)

Improvements

Upgraded Argo Workflows to v3.7.0 to resolve known security vulnerabilities.
Made pipeline sidecar's securityContext configurable via Helm values.

Bug fixes

Fixed a bug where training failed for some base models (deberta-v3-base, multilingual-deberta-v3-base) due to protobuf implementation conflicts.
Fixed an issue where importing models failed during post-processing because test PDFs were not packaged correctly.
Updated internal Kafka client to the latest version.
Fixed a crash when publishing a pipeline with a conditional multi-labeling model and no condition values selected.

v0.12.0 (July 21, 2025)

New features

Added support for custom GPU scheduler integration with Run.ai. When enabled, Document AI can use a custom scheduler for dynamic GPU workloads.
Reduced disk space requirements and simplified deployment by merging the OCR and scorer runtime images into a single, smaller image.

Improvements

Patched security vulnerabilities.
Upgraded Argo Workflows to v3.6.10.

Bug fixes

Fixed an issue where the Apply labels operation failed if a document set contained a UUID in its name.

v0.11.3 (May 13, 2025)

Bug fix

Fixed an issue where private CA certificates were not properly merged with the system CA bundle. This caused SSL validation errors when connecting to external services, such as Azure Blob Storage or Azure PostgreSQL. The fix ensures that required newline characters are preserved during certificate merging.

v0.11.2 (May 8, 2025)

Improvements

Patched security vulnerabilities.

Bug fixes

Fixed an issue where the Document AI scorer deployment did not merge the system CA bundle with user-provided certificates during installation.
Fixed an RBAC issue that allowed users without the appropriate cluster role to publish pipelines.
Added warnings for users who lack the required cluster roles when attempting to unpublish pipelines.
Fixed an RBAC issue where users with score-only permissions received no warning when attempting to roll back a pipeline.

v0.11.1 (April 14, 2025)

Improvements

Patched security vulnerabilities.
Upgraded Go to version 1.24.2.

v0.11.0 (April 4, 2025)

New features

Added Azure Blob Storage support as an alternative to MinIO in Azure environments.
- Configurable through specific integration points:
  - azureBlob configuration in document-ai Helm chart values
  - datastore and azureBlob sections in ml-api Helm chart values
  - artifactRepository configuration for Argo Workflows

caution

Configuration requirements:

Only one storage option's enabled field can be set to true (either S3 or Azure). Setting both to true or both to false will cause the application to return a fatal error.
S3-compatible storage remains the default if not explicitly configured.

Improvements

Upgraded Argo Workflows to v3.6.5 to resolve several vulnerabilities.
Upgraded Go to version 1.24.1.
Upgraded Go dependencies.

Bug fixes

Fixed an issue where the list of pipelines showed user ID instead of preferred username.
Fixed an issue where runtime version was displayed as empty in the pipeline list.
Fixed an issue where custom post-processor code was not prefilled in the pipeline publishing modal.

v0.10.0 (Febrary 28, 2025)

New features

Document AI can now be configured to schedule dynamic workloads into Kubernetes namespaces that are predefined in the corresponding authorization workspace.
The logo can be customized in the UI.

Improvements

Project deletion logic has been improved to delete any remnant data if the project is deleted while any pending job is running.
Upgraded Argo Workflows to v3.6.4 for improved stability and performance.
The auto-deletion warning now explicitly states that published pipelines will also be deleted when a project is removed.

Bug fixes

Fixed an issue that prevented project deletion when pending jobs existed.
Fixed an issue where .zip files couldn't be uploaded into Viewer.
Fixed an issue where button text was cut off in the STATUS field.
Fixed a misprint in the warning message when adding a collaborator.
Fixed an issue where results were incorrectly displayed when processing documents without tables.
Fixed inconsistent use of PostgreSQL library – Fully integrated github.com/h2oai/go-pkg/database/postgres for all database connections, ensuring support for passwordless connection strings and IAM authentication for AWS RDS.

v0.9.2 and v0.9.1 (December 3, 2024)

Improvements

Annotated all pods (system and user workloads) for telemetry and cost allocation, ensuring better tracking and reporting capabilities.
Updated the discovery service response to include services/document-ai-backend, improving the visibility and integration of these services within H2O AI Cloud.
Improved the UX for adding collaborators to a project by replacing the single textbox with a tabular list, allowing easier management of email addresses and edits.
Improved the UX of the Publish Pipeline panel by reflecting changes in YAML after enabling the custom pipeline, allowing dynamic model selection, and supporting multiple models in the pipeline with a dynamic table instead of static dropdowns.
Upgraded Argo Workflows to v3.6.0.
Improved the description field by adding a slider to prevent long descriptions from pushing other menu tiles to the bottom.
Kafka now distributes messages using a Round Robin strategy, improving load balancing and scalability between pipeline replicas.

Fixes

Fixed issue where pipelines remain visible in the Viewer after the associated project is deleted in the Publisher.
Fixed issue where unsupported file extensions could be dragged and dropped without a clear error message, now displaying a warning when such files are added.
Fixed issue where uploading a JPEG and then downloading it resulted in the file extension being changed to PDF, now retaining the original file extension and mime type.
Fixed issue where ProjectID was not persisted with the pipeline, now ensuring ProjectID is attached to the K8s Job Extractor cloud.h2o.ai/workspace-id label via multipart upload service.
Fixed an issue where the Helm package command used the --app-version switch instead of --version, causing a mismatch between the Helm chart version and the release version.
Fixed issue where, when Auto-deletion was enabled for a project, the artifacts were deleted but the empty project remained in the list, even though Auto-Purge was not enabled and the retention period was set to two days.
Fixed issue where the collaborator's email address input was not processed when the user blurred the field via mouse, causing users to believe the input was filled but not submitted or validated.
Fixed issue where pipelines couldn't be listed due to the backend reading NULL into a string, causing a Scan error.
Fixed issue where uploading a nested zipped file failed due to postprocessing being stuck in a pending state, preventing successful file upload and processing.
Fixed issue where the namespace was not fully visible in the Revisions pop-up by making the columns responsive to size.
Fixed an issue where the UI failed to fetch the openid-configuration from Keycloak, causing the browser to freeze due to immediate retry attempts in a continuous loop.
Fixed an issue where users were incorrectly notified about unsaved changes after saving their changes to annotation sets while attempting to leave the page.
Fixed an issue where failed models appeared in the list to be selected for publishing in the pipeline.
Fixed an issue where a failed model was present in the list of models to be selected for publishing in the Token labeling section.
Fixed an issue where it was possible to publish a pipeline with spaces instead of valid page class values, which should require logical values in the page class fields.
Fixed issue where the Publish Pipeline button was not enabled even when no model was uploaded.
Fixed bug where users had to click the Save button twice when adding several collaborators at once.
Fixed an issue that would not let users to run Document AI on OpenShift clusters.
Fixed an issue where the Helm chart did not allow configuring the securityContext of the scorer's sidecar container, now enabling this configuration in v0.9.2.

v0.9.0 (November 1, 2024)

New Features

Added ability to downgrade your pipeline version to the previous version. This lets you recover pipelines from failure after auto-upgrade.
Added visible logs you can access if your pipeline fails to start so you can debug it easier.
Added ability to put pipelines to sleep to keep them from consuming resources when not actively being used.
Added a validation rule to the pipeline that checks that the requested CPU or memory is not higher than the upper limit.

Improvements

Added ability to see pipelines in all states (including deleted pipelines).
Added ability to upgrade pipelines concurrently by bringing in a configurable pool of workers.
Added a readiness probe to the scorer/pipeline to check connectivity with Kafka since Kafka is required for the scorer/pipeline to work correctly.
Improved setting transparency for pipelines.
Updated the consistency of the naming in Viewer from "status" to "state".

Fixes

Fixed pipeline publication process starting but not completing in UI after several hours despite finishing in backend.
Fixed new runtime version not reflecting in UI after the upgrade.
Fixed long file names overflowing the text boxes in the UI by causing the text to wrap when it is too long.
Fixed labels not saving from one annotation set when concatenating it to another annotation set with the same type of label IDs.
Fixed concatenation of two separate document sets with OCR resulting in the wrong number of documents.
Fixed inability to concatenate an OCR-ed annotation set with a non-OCR-ed annotation set / annotation set without labels / annotation set with the same type of labels / with the same label.
Fixed auto-deletion's "less than 2 weeks ahead" time counter not reading properly, causing the banner to always be red.
Fixed the cleared token value reverting back after re-login.
Fixed "Edit Project" reverting your changes on the Edit Project panel if you stay there long enough without doing anything.
Fixed pipelines that were put to sleep due to them failing not being rewoken to fail further by feeding them new documents.
Fixed error in failing to fetch and score documents of large sizes or zipped documents of large sizes in Publisher.
Fixed previous document name being shown in Viewer when reviewing further documents though the labels were from the new document.
Fixed the inability to edit a published pipeline so that it contained the OCR-method only.
Fixed annotation set importation prompting you to reconcile duplicates when there are, in fact, no duplicates.
Fixed pipelines published without models (i.e. OCR-only pipelines) not being visible in the Project.
Fixed publishing a pipeline with no models that has the same name as a previously un-published pipeline that has models displaying those models from the un-published pipeline with the same name until the page is refreshed.
Fixed pipeline auto-upgrader getting stuck in the pending-upgrade state by performing a rollback to the last state if necessary.
Fixed OCR methods failing except for Tesseract.
Fixed inability to split an OCR-ed annotation set.
Fixed copied attributed in imported documents not being present in the annotation detail table.
Fixed misspelling on the add documents panel in the UI.
Fixed misprint in the error message about adding the names of attributes from deleting the names of the attributes.
Fixed inability to import a file larger than 5M.
Ensured that the rollback to previous pipeline version button overlays properly.
Addressed multiple security concerns.

v0.8.1 (July, 2024)

Improvements

Implemented new pipeline backward compatibilities: Upgrading Document AI to a new verion (0.7 > 0.8) will cause all existing pipelines to automatically upgrade to the new version.
Updated Argus to v0.22.3.
Updated bulk scorer to v0.2.5.
Added a separate namespace for user workloads in Kubernetes.
Added SSL/SASL support for communication with Kafka.
Added gocloud.dev as the drop-in replacement for *sql.DB to handle DB connections from the API server.
Introduced pod disruption budgets for every deployment.
Added compatibility with pipelines published in previous versions of Document AI (v0.7.x).
Introduced the extra parameter to bulk scorer.

Fixes

Fixed pipelines with long names only being partially deployed.
Extended support exclusively to networking.k8s.io/v1 and dropped support to networking.k8s.io/v1beta1 for ingresses.
Removed ZIP files as a listed option when submitting documents to a pipeline to score in Publisher UI since ZIP is not supported for scoring.
Fixed pipelines with PaddleOCR-Latin failing to score documents due to file read permission issue.
Fixed pipelines failing to score large files with a high number of pages.

v0.8.0 (July, 2024)

New features

Added the ability to configure nodeSelector and tolerations for doc-proxy-scorer.
Implemented table support.
Implemented project collaboration between users using email invite.
Introduced the hybrid OCR processor to read PDF characters directly from documents and to extract text contained in images.

Improvements

Introduced new scoring infrastructure.
Migrated pipeline deployment to Viewer backend. Pipeline publishing still uses Helm charts, but now creates a new record in a postgres table which is necessary for the endpoint listing the published pipelines (required for Viewer backend scalability).
Implemented a get Job endpoint that allows polling the request/job state.
Added capability for Viewer to parse various JSON outputs.

Fixes

Fixed copied attributes from an annotation set not being reflected in the annotation detail table.
Fixed the inability to import a document set with copied attributes set.
Fixed the inability to delete an annotation set if it was exported before.
Fixed renaming an annotation set producing an empty string instead of the changed name.
Fixed not being able to publish a pipeline with the same name as an unpublished pipeline.
Fixed Helm release of a published pipeline having the wrong appVersion (appVersion is indicated as —version).
Removed misleading log messages.
Resolved critical vulnerabilities in the image h2oai-mlapi-worker-ocr.
Fixed auto deletion issue where a project was being deleted on [date -1].
Fixed the name of a document set disappearing after being edited.
Fixed the inability to delete documents when the corresponding annotation set was edited.
Fixed the pipeline controller ticker (which syncs pipelines) stopping working when it errored in the middle. It will now restart.
Fixed pipelines published in v0.7.x not showing correct page counts.
Fixed documents not being scored when pipeline is re-scaled.
Fixed the inability to unpublish pipelines.
Fixed pipeline autoscaling not working.
Fixed error when obtaining logs for scoring documents in Publisher.
Fixed failure to unzip ingested ZIP files.

v0.7.2 (Mar 14, 2024)

Fixes

Fixed a memory leak to stop out-of-memory pod failure after a certain amount of scored documents.
Fixed issue with page population.

v0.7.1 (Feb 12, 2023)

New features

Implemented RBAC for publishing pipelines.

Improvements

Added compatibility with pipelines published in previous versions of Document AI (v0.6 and v0.5).
Made Viewer work with pipelines that have custom V4 post-processor.

Fixes

Fixed archive extracting.
Fixed deletion of DocumentSet when corresponding AnnotationSet was edited.

v0.7.0 (Nov 5, 2023)

New features

Introduced the universal scoring pipeline.
Introduced ability to automatically purge training artifacts.
Introduced ability to schedule the deletion of your whole project and all of its resources.
Introduced new base models for training a model in Publisher.
Introduced learning rate for model training in Publisher.
Implemented role-based access control (RBAC) to Document AI on HAIC.

Improvements

Expanded file support for Viewer document import to include JPG and ZIP files.
Created a custom pipeline repository which includes fine-tuned BERT models and OCR-only pipelines.
Added the ability to skip pages from scoring.

Fixes

Removed support for Kubernetes v.< 1.23.
Deleting now works on Project Level even if there are Job failures; this also deletes underlying Pipelines.

v0.6.2 (Aug 23, 2023)

New features

Added the ability to update or add new templates.

Fixes

Fixed an issue where the logic used to pull the feature store web proxy image did not work if the image registry is empty.
Made an improvement to include input_dir content in the request going to the custom post-processor deployment.

Known issues

After publishing a pipeline using a model trained on one file on H2O Document AI Viewer, the document will show zero results.

v0.6.1 (Jul 28, 2023)

Improvements

Added support for Kubernetes 1.26.
Updated telemetry implementation to make scored documents more efficiently retrieved.

Known issues

Adding a file in edit in page view only supports image files.
Adding a file in edit in page view does not update page and document number on the annotation sets page.
After adding a file in edit in page view, leaving edit in page view, then returning to edit in page view, you cannot view said added file.

v0.6 (May 21, 2023)

New features

Introduced H2O Document AI - Viewer for business users to score documents on built pipelines. [MVP]
Introduced initial telemetry integration.
Added the ability to score PDFs with page ranges.

Improvements

Renamed original H2O Document AI to H2O Document AI - Publisher.
Added a button to Published Pipelines to retrieve the scoring URL.
Sped up CPU for EfficientNet OCR models by running with optimized OpenVINO.
Sped up PDF scoring with JPG instead of PNG.

Fixes

Fixed an issue where long names could not be used when publishing pipelines.
Fixed an issue where zip files could not be uploaded from Windows machines.
Fixed an issue where job status was still reported for cancelled jobs.

v0.5 (Apr 13, 2023)

New features

Added optical character recognition (OCR) language support for:
- Latin (e.g. Spanish)
- Arabic (e.g. Persian)
Added Document Text Recognition (DocTR) EfficientNet models to better recognize handwritten documents.
Added ability to set batch size and number of epochs for model training.
Added command-line bulk scorer to score a large number of documents (ships separate from main product).

Improvements

Upgraded the ML API to v0.4.0.
Refactored and improved the training user interface for better usability.
Added the ability to gate access to H2O Document AI based on a user's role.

Feedback

Submit and view feedback for this page
Send feedback about H2O Document AI to cloud-feedback@h2o.ai

v0.12.1 (August 4, 2025)​

Improvements​

Bug fixes​

v0.12.0 (July 21, 2025)​

New features​

Improvements​

Bug fixes​

v0.11.3 (May 13, 2025)​

Bug fix​

v0.11.2 (May 8, 2025)​

Improvements​

Bug fixes​

v0.11.1 (April 14, 2025)​

Improvements​

v0.11.0 (April 4, 2025)​

New features​

Improvements​

Bug fixes​

v0.10.0 (Febrary 28, 2025)​

New features​

Improvements​

Bug fixes​

v0.9.2 and v0.9.1 (December 3, 2024)​

Improvements​

Fixes​

v0.9.0 (November 1, 2024)​

New Features​

Improvements​

Fixes​

v0.8.1 (July, 2024)​

Improvements​

Fixes​

v0.8.0 (July, 2024)​

New features​

Improvements​

Fixes​

v0.7.2 (Mar 14, 2024)​

Fixes​

v0.7.1 (Feb 12, 2023)​

New features​

Improvements​

Fixes​

v0.7.0 (Nov 5, 2023)​

New features​

Improvements​

Fixes​

v0.6.2 (Aug 23, 2023)​

New features​

Fixes​

Known issues​

v0.6.1 (Jul 28, 2023)​

Improvements​

Known issues​

v0.6 (May 21, 2023)​

New features​

Improvements​

Fixes​

v0.5 (Apr 13, 2023)​

New features​

Improvements​

v0.12.1 (August 4, 2025)

Improvements

Bug fixes

v0.12.0 (July 21, 2025)

New features

Improvements

Bug fixes

v0.11.3 (May 13, 2025)

Bug fix

v0.11.2 (May 8, 2025)

Improvements

Bug fixes

v0.11.1 (April 14, 2025)

Improvements

v0.11.0 (April 4, 2025)

New features

Improvements

Bug fixes

v0.10.0 (Febrary 28, 2025)

New features

Improvements

Bug fixes

v0.9.2 and v0.9.1 (December 3, 2024)

Improvements

Fixes

v0.9.0 (November 1, 2024)

New Features

Improvements

Fixes

v0.8.1 (July, 2024)

Improvements

Fixes

v0.8.0 (July, 2024)

New features

Improvements

Fixes

v0.7.2 (Mar 14, 2024)

Fixes

v0.7.1 (Feb 12, 2023)

New features

Improvements

Fixes

v0.7.0 (Nov 5, 2023)

New features

Improvements

Fixes

v0.6.2 (Aug 23, 2023)

New features

Fixes

Known issues

v0.6.1 (Jul 28, 2023)

Improvements

Known issues

v0.6 (May 21, 2023)

New features

Improvements

Fixes

v0.5 (Apr 13, 2023)

New features

Improvements