Skip to main content

Release notes

v0.9.1 (December 3, 2024)

Improvements

  • Annotated all pods (system and user workloads) for telemetry and cost allocation, ensuring better tracking and reporting capabilities.
  • Updated the discovery service response to include services/document-ai-backend, improving the visibility and integration of these services within H2O AI Cloud.
  • Improved the UX for adding collaborators to a project by replacing the single textbox with a tabular list, allowing easier management of email addresses and edits.
  • Improved the UX of the Publish Pipeline panel by reflecting changes in YAML after enabling the custom pipeline, allowing dynamic model selection, and supporting multiple models in the pipeline with a dynamic table instead of static dropdowns.
  • Upgraded Argo Workflows to v3.6.0.
  • Improved the description field by adding a slider to prevent long descriptions from pushing other menu tiles to the bottom.
  • Kafka now distributes messages using a Round Robin strategy, improving load balancing and scalability between pipeline replicas.

Fixes

  • Fixed issue where pipelines remain visible in the Viewer after the associated project is deleted in the Publisher.
  • Fixed issue where unsupported file extensions could be dragged and dropped without a clear error message, now displaying a warning when such files are added.
  • Fixed issue where uploading a JPEG and then downloading it resulted in the file extension being changed to PDF, now retaining the original file extension and mime type.
  • Fixed issue where ProjectID was not persisted with the pipeline, now ensuring ProjectID is attached to the K8s Job Extractor cloud.h2o.ai/workspace-id label via multipart upload service.
  • Fixed an issue where the Helm package command used the --app-version switch instead of --version, causing a mismatch between the Helm chart version and the release version.
  • Fixed issue where, when Auto-deletion was enabled for a project, the artifacts were deleted but the empty project remained in the list, even though Auto-Purge was not enabled and the retention period was set to two days.
  • Fixed issue where the collaborator's email address input was not processed when the user blurred the field via mouse, causing users to believe the input was filled but not submitted or validated.
  • Fixed issue where pipelines couldn't be listed due to the backend reading NULL into a string, causing a Scan error.
  • Fixed issue where uploading a nested zipped file failed due to postprocessing being stuck in a pending state, preventing successful file upload and processing.
  • Fixed issue where the namespace was not fully visible in the Revisions pop-up by making the columns responsive to size.
  • Fixed an issue where the UI failed to fetch the openid-configuration from Keycloak, causing the browser to freeze due to immediate retry attempts in a continuous loop.
  • Fixed an issue where users were incorrectly notified about unsaved changes after saving their changes to annotation sets while attempting to leave the page.
  • Fixed an issue where failed models appeared in the list to be selected for publishing in the pipeline.
  • Fixed an issue where a failed model was present in the list of models to be selected for publishing in the Token labeling section.
  • Fixed an issue where it was possible to publish a pipeline with spaces instead of valid page class values, which should require logical values in the page class fields.
  • Fixed issue where the Publish Pipeline button was not enabled even when no model was uploaded.
  • Fixed bug where users had to click the Save button twice when adding several collaborators at once.
  • Fixed an issue that would not let users to run Document AI on OpenShift clusters.

v0.9.0 (November 1, 2024)

New Features

  • Added ability to downgrade your pipeline version to the previous version. This lets you recover pipelines from failure after auto-upgrade.
  • Added visible logs you can access if your pipeline fails to start so you can debug it easier.
  • Added ability to put pipelines to sleep to keep them from consuming resources when not actively being used.
  • Added a validation rule to the pipeline that checks that the requested CPU or memory is not higher than the upper limit.

Improvements

  • Added ability to see pipelines in all states (including deleted pipelines).
  • Added ability to upgrade pipelines concurrently by bringing in a configurable pool of workers.
  • Added a readiness probe to the scorer/pipeline to check connectivity with Kafka since Kafka is required for the scorer/pipeline to work correctly.
  • Improved setting transparency for pipelines.
  • Updated the consistency of the naming in Viewer from "status" to "state".

Fixes

  • Fixed pipeline publication process starting but not completing in UI after several hours despite finishing in backend.
  • Fixed new runtime version not reflecting in UI after the upgrade.
  • Fixed long file names overflowing the text boxes in the UI by causing the text to wrap when it is too long.
  • Fixed labels not saving from one annotation set when concatenating it to another annotation set with the same type of label IDs.
  • Fixed concatenation of two separate document sets with OCR resulting in the wrong number of documents.
  • Fixed inability to concatenate an OCR-ed annotation set with a non-OCR-ed annotation set / annotation set without labels / annotation set with the same type of labels / with the same label.
  • Fixed auto-deletion's "less than 2 weeks ahead" time counter not reading properly, causing the banner to always be red.
  • Fixed the cleared token value reverting back after re-login.
  • Fixed "Edit Project" reverting your changes on the Edit Project panel if you stay there long enough without doing anything.
  • Fixed pipelines that were put to sleep due to them failing not being rewoken to fail further by feeding them new documents.
  • Fixed error in failing to fetch and score documents of large sizes or zipped documents of large sizes in Publisher.
  • Fixed previous document name being shown in Viewer when reviewing further documents though the labels were from the new document.
  • Fixed the inability to edit a published pipeline so that it contained the OCR-method only.
  • Fixed annotation set importation prompting you to reconcile duplicates when there are, in fact, no duplicates.
  • Fixed pipelines published without models (i.e. OCR-only pipelines) not being visible in the Project.
  • Fixed publishing a pipeline with no models that has the same name as a previously un-published pipeline that has models displaying those models from the un-published pipeline with the same name until the page is refreshed.
  • Fixed pipeline auto-upgrader getting stuck in the pending-upgrade state by performing a rollback to the last state if necessary.
  • Fixed OCR methods failing except for Tesseract.
  • Fixed inability to split an OCR-ed annotation set.
  • Fixed copied attributed in imported documents not being present in the annotation detail table.
  • Fixed misspelling on the add documents panel in the UI.
  • Fixed misprint in the error message about adding the names of attributes from deleting the names of the attributes.
  • Fixed inability to import a file larger than 5M.
  • Ensured that the rollback to previous pipeline version button overlays properly.
  • Addressed multiple security concerns.

v0.8.1 (July, 2024)

Improvements

  • Implemented new pipeline backward compatibilities: Upgrading Document AI to a new verion (0.7 > 0.8) will cause all existing pipelines to automatically upgrade to the new version.
  • Updated Argus to v0.22.3.
  • Updated bulk scorer to v0.2.5.
  • Added a separate namespace for user workloads in Kubernetes.
  • Added SSL/SASL support for communication with Kafka.
  • Added gocloud.dev as the drop-in replacement for *sql.DB to handle DB connections from the API server.
  • Introduced pod disruption budgets for every deployment.
  • Added compatibility with pipelines published in previous versions of Document AI (v0.7.x).
  • Introduced the extra parameter to bulk scorer.

Fixes

  • Fixed pipelines with long names only being partially deployed.
  • Extended support exclusively to networking.k8s.io/v1 and dropped support to networking.k8s.io/v1beta1 for ingresses.
  • Removed ZIP files as a listed option when submitting documents to a pipeline to score in Publisher UI since ZIP is not supported for scoring.
  • Fixed pipelines with PaddleOCR-Latin failing to score documents due to file read permission issue.
  • Fixed pipelines failing to score large files with a high number of pages.

v0.8.0 (July, 2024)

New features

Improvements

  • Introduced new scoring infrastructure.
  • Migrated pipeline deployment to Viewer backend. Pipeline publishing still uses helm charts, but now creates a new record in a postgres table which is necessary for the endpoint listing the published pipelines (required for Viewer backend scalability).
  • Implemented a get Job endpoint that allows polling the request/job state.
  • Added capability for Viewer to parse various JSON outputs.

Fixes

  • Fixed copied attributes from an annotation set not being reflected in the annotation detail table.
  • Fixed the inability to import a document set with copied attributes set.
  • Fixed the inability to delete an annotation set if it was exported before.
  • Fixed renaming an annotation set producing an empty string instead of the changed name.
  • Fixed not being able to publish a pipeline with the same name as an unpublished pipeline.
  • Fixed Helm release of a published pipeline having the wrong appVersion (appVersion is indicated as —version).
  • Removed misleading log messages.
  • Resolved critical vulnerabilities in the image h2oai-mlapi-worker-ocr.
  • Fixed auto deletion issue where a project was being deleted on [date -1].
  • Fixed the name of a document set disappearing after being edited.
  • Fixed the inability to delete documents when the corresponding annotation set was edited.
  • Fixed the pipeline controller ticker (which syncs pipelines) stopping working when it errored in the middle. It will now restart.
  • Fixed pipelines published in v0.7.x not showing correct page counts.
  • Fixed documents not being scored when pipeline is re-scaled.
  • Fixed the inability to unpublish pipelines.
  • Fixed pipeline autoscaling not working.
  • Fixed error when obtaining logs for scoring documents in Publisher.
  • Fixed failure to unzip ingested ZIP files.

v0.7.2 (Mar 14, 2024)

Fixes

  • Fixed a memory leak to stop out-of-memory pod failure after a certain amount of scored documents.
  • Fixed issue with page population.

v0.7.1 (Feb 12, 2023)

New features

  • Implemented RBAC for publishing pipelines.

Improvements

  • Added compatibility with pipelines published in previous versions of Document AI (v0.6 and v0.5).
  • Made Viewer work with pipelines that have custom V4 post-processor.

Fixes

  • Fixed archive extracting.
  • Fixed deletion of DocumentSet when corresponding AnnotationSet was edited.

v0.7.0 (Nov 5, 2023)

New features

  • Introduced the universal scoring pipeline.
  • Introduced ability to automatically purge training artifacts.
  • Introduced ability to schedule the deletion of your whole project and all of its resources.
  • Introduced new base models for training a model in Publisher.
  • Introduced learning rate for model training in Publisher.
  • Implemented role-based access control (RBAC) to Document AI on HAIC.

Improvements

  • Expanded file support for Viewer document import to include JPG and ZIP files.
  • Created a custom pipeline repository which includes fine-tuned BERT models and OCR-only pipelines.
  • Added the ability to skip pages from scoring.

Fixes

  • Removed support for Kubernetes v.< 1.23.
  • Deleting now works on Project Level even if there are Job failures; this also deletes underlying Pipelines.

v0.6.2 (Aug 23, 2023)

New features

  • Added the ability to update or add new templates.

Fixes

  • Fixed an issue where the logic used to pull the feature store web proxy image did not work if the image registry is empty.
  • Made an improvement to include input_dir content in the request going to the custom post-processor deployment.

Known issues

v0.6.1 (Jul 28, 2023)

Improvements

  • Added support for Kubernetes 1.26.
  • Updated telemetry implementation to make scored documents more efficiently retrieved.

Known issues

  • Adding a file in edit in page view only supports image files.
  • Adding a file in edit in page view does not update page and document number on the annotation sets page.
  • After adding a file in edit in page view, leaving edit in page view, then returning to edit in page view, you cannot view said added file.

v0.6 (May 21, 2023)

New features

  • Introduced H2O Document AI - Viewer for business users to score documents on built pipelines. [MVP]
  • Introduced initial telemetry integration.
  • Added the ability to score PDFs with page ranges.

Improvements

  • Renamed original H2O Document AI to H2O Document AI - Publisher.
  • Added a button to Published Pipelines to retrieve the scoring URL.
  • Sped up CPU for EfficientNet OCR models by running with optimized OpenVINO.
  • Sped up PDF scoring with JPG instead of PNG.

Fixes

  • Fixed an issue where long names could not be used when publishing pipelines.
  • Fixed an issue where zip files could not be uploaded from Windows machines.
  • Fixed an issue where job status was still reported for cancelled jobs.

v0.5 (Apr 13, 2023)

New features

Improvements

  • Upgraded the ML API to v0.4.0.
  • Refactored and improved the training user interface for better usability.
  • Added the ability to gate access to H2O Document AI based on a user's role.

Feedback