Skip to main content

Release notes

v0.8.1 (July, 2024)

Improvements

  • Implemented new pipeline backward compatibilities: Upgrading Document AI to a new verion (0.7 > 0.8) will cause all existing pipelines to automatically upgrade to the new version.
  • Updated Argus to v0.22.3.
  • Updated bulk scorer to v0.2.5.
  • Added a separate namespace for user workloads in Kubernetes.
  • Added SSL/SASL support for communication with Kafka.
  • Added gocloud.dev as the drop-in replacement for *sql.DB to handle DB connections from the API server.
  • Introduced pod disruption budgets for every deployment.
  • Added compatibility with pipelines published in previous versions of Document AI (v0.7.x).
  • Introduced the extra parameter to bulk scorer.

Fixes

  • Fixed pipelines with long names only being partially deployed.
  • Extended support exclusively to networking.k8s.io/v1 and dropped support to networking.k8s.io/v1beta1 for ingresses.
  • Removed ZIP files as a listed option when submitting documents to a pipeline to score in Publisher UI since ZIP is not supported for scoring.
  • Fixed pipelines with PaddleOCR-Latin failing to score documents due to file read permission issue.
  • Fixed pipelines failing to score large files with a high number of pages.

v0.8.0 (July, 2024)

New features

Improvements

  • Introduced new scoring infrastructure.
  • Migrated pipeline deployment to Viewer backend. Pipeline publishing still uses helm charts, but now creates a new record in a postgres table which is necessary for the endpoint listing the published pipelines (required for Viewer backend scalability).
  • Implemented a get Job endpoint that allows polling the request/job state.
  • Added capability for Viewer to parse various JSON outputs.

Fixes

  • Fixed copied attributes from an annotation set not being reflected in the annotation detail table.
  • Fixed the inability to import a document set with copied attributes set.
  • Fixed the inability to delete an annotation set if it was exported before.
  • Fixed renaming an annotation set producing an empty string instead of the changed name.
  • Fixed not being able to publish a pipeline with the same name as an unpublished pipeline.
  • Fixed Helm release of a published pipeline having the wrong appVersion (appVersion is indicated as —version).
  • Removed misleading log messages.
  • Resolved critical vulnerabilities in the image h2oai-mlapi-worker-ocr.
  • Fixed auto deletion issue where a project was being deleted on [date -1].
  • Fixed the name of a document set disappearing after being edited.
  • Fixed the inability to delete documents when the corresponding annotation set was edited.
  • Fixed the pipeline controller ticker (which syncs pipelines) stopping working when it errored in the middle. It will now restart.
  • Fixed pipelines published in v0.7.x not showing correct page counts.
  • Fixed documents not being scored when pipeline is re-scaled.
  • Fixed the inability to unpublish pipelines.
  • Fixed pipeline autoscaling not working.
  • Fixed error when obtaining logs for scoring documents in Publisher.
  • Fixed failure to unzip ingested ZIP files.

v0.7.2 (Mar 14, 2024)

Fixes

  • Fixed a memory leak to stop out-of-memory pod failure after a certain amount of scored documents.
  • Fixed issue with page population.

v0.7.1 (Feb 12, 2023)

New features

  • Implemented RBAC for publishing pipelines.

Improvements

  • Added compatibility with pipelines published in previous versions of Document AI (v0.6 and v0.5).
  • Made Viewer work with pipelines that have custom V4 post-processor.

Fixes

  • Fixed archive extracting.
  • Fixed deletion of DocumentSet when corresponding AnnotationSet was edited.

v0.7.0 (Nov 5, 2023)

New features

  • Introduced the universal scoring pipeline.
  • Introduced ability to automatically purge training artifacts.
  • Introduced ability to schedule the deletion of your whole project and all of its resources.
  • Introduced new base models for training a model in Publisher.
  • Introduced learning rate for model training in Publisher.
  • Implemented role-based access control (RBAC) to Document AI on HAIC.

Improvements

  • Expanded file support for Viewer document import to include JPG and ZIP files.
  • Created a custom pipeline repository which includes fine-tuned BERT models and OCR-only pipelines.
  • Added the ability to skip pages from scoring.

Fixes

  • Removed support for Kubernetes v.< 1.23.
  • Deleting now works on Project Level even if there are Job failures; this also deletes underlying Pipelines.

v0.6.2 (Aug 23, 2023)

New features

  • Added the ability to update or add new templates.

Fixes

  • Fixed an issue where the logic used to pull the feature store web proxy image did not work if the image registry is empty.
  • Made an improvement to include input_dir content in the request going to the custom post-processor deployment.

Known issues

v0.6.1 (Jul 28, 2023)

Improvements

  • Added support for Kubernetes 1.26.
  • Updated telemetry implementation to make scored documents more efficiently retrieved.

Known issues

  • Adding a file in edit in page view only supports image files.
  • Adding a file in edit in page view does not update page and document number on the annotation sets page.
  • After adding a file in edit in page view, leaving edit in page view, then returning to edit in page view, you cannot view said added file.

v0.6 (May 21, 2023)

New features

  • Introduced H2O Document AI - Viewer for business users to score documents on built pipelines. [MVP]
  • Introduced initial telemetry integration.
  • Added the ability to score PDFs with page ranges.

Improvements

  • Renamed original H2O Document AI to H2O Document AI - Publisher.
  • Added a button to Published Pipelines to retrieve the scoring URL.
  • Sped up CPU for EfficientNet OCR models by running with optimized OpenVINO.
  • Sped up PDF scoring with JPG instead of PNG.

Fixes

  • Fixed an issue where long names could not be used when publishing pipelines.
  • Fixed an issue where zip files could not be uploaded from Windows machines.
  • Fixed an issue where job status was still reported for cancelled jobs.

v0.5 (Apr 13, 2023)

New features

Improvements

  • Upgraded the ML API to v0.4.0.
  • Refactored and improved the training user interface for better usability.
  • Added the ability to gate access to H2O Document AI based on a user's role.

Feedback