Release notes
v0.8.1 (July, 2024)
Improvements
- Implemented new pipeline backward compatibilities: Upgrading Document AI to a new verion (0.7 > 0.8) will cause all existing pipelines to automatically upgrade to the new version.
- Updated Argus to v0.22.3.
- Updated bulk scorer to v0.2.5.
- Added a separate namespace for user workloads in Kubernetes.
- Added SSL/SASL support for communication with Kafka.
- Added
gocloud.dev
as the drop-in replacement for*sql.DB
to handle DB connections from the API server. - Introduced pod disruption budgets for every deployment.
- Added compatibility with pipelines published in previous versions of Document AI (v0.7.x).
- Introduced the
extra
parameter to bulk scorer.
Fixes
- Fixed pipelines with long names only being partially deployed.
- Extended support exclusively to
networking.k8s.io/v1
and dropped support tonetworking.k8s.io/v1beta1
for ingresses. - Removed ZIP files as a listed option when submitting documents to a pipeline to score in Publisher UI since ZIP is not supported for scoring.
- Fixed pipelines with PaddleOCR-Latin failing to score documents due to file read permission issue.
- Fixed pipelines failing to score large files with a high number of pages.
v0.8.0 (July, 2024)
New features
- Added the ability to configure
nodeSelector
andtolerations
for doc-proxy-scorer. - Implemented table support.
- Implemented project collaboration between users using email invite.
- Introduced the hybrid OCR processor to read PDF characters directly from documents and to extract text contained in images.
Improvements
- Introduced new scoring infrastructure.
- Migrated pipeline deployment to Viewer backend. Pipeline publishing still uses helm charts, but now creates a new record in a postgres table which is necessary for the endpoint listing the published pipelines (required for Viewer backend scalability).
- Implemented a get Job endpoint that allows polling the request/job state.
- Added capability for Viewer to parse various JSON outputs.
Fixes
- Fixed copied attributes from an annotation set not being reflected in the annotation detail table.
- Fixed the inability to import a document set with copied attributes set.
- Fixed the inability to delete an annotation set if it was exported before.
- Fixed renaming an annotation set producing an empty string instead of the changed name.
- Fixed not being able to publish a pipeline with the same name as an unpublished pipeline.
- Fixed Helm release of a published pipeline having the wrong appVersion (appVersion is indicated as
—version
). - Removed misleading log messages.
- Resolved critical vulnerabilities in the image
h2oai-mlapi-worker-ocr
. - Fixed auto deletion issue where a project was being deleted on
[date -1]
. - Fixed the name of a document set disappearing after being edited.
- Fixed the inability to delete documents when the corresponding annotation set was edited.
- Fixed the pipeline controller ticker (which syncs pipelines) stopping working when it errored in the middle. It will now restart.
- Fixed pipelines published in v0.7.x not showing correct page counts.
- Fixed documents not being scored when pipeline is re-scaled.
- Fixed the inability to unpublish pipelines.
- Fixed pipeline autoscaling not working.
- Fixed error when obtaining logs for scoring documents in Publisher.
- Fixed failure to unzip ingested ZIP files.
v0.7.2 (Mar 14, 2024)
Fixes
- Fixed a memory leak to stop out-of-memory pod failure after a certain amount of scored documents.
- Fixed issue with page population.
v0.7.1 (Feb 12, 2023)
New features
- Implemented RBAC for publishing pipelines.
Improvements
- Added compatibility with pipelines published in previous versions of Document AI (v0.6 and v0.5).
- Made Viewer work with pipelines that have custom V4 post-processor.
Fixes
- Fixed archive extracting.
- Fixed deletion of DocumentSet when corresponding AnnotationSet was edited.
v0.7.0 (Nov 5, 2023)
New features
- Introduced the universal scoring pipeline.
- Introduced ability to automatically purge training artifacts.
- Introduced ability to schedule the deletion of your whole project and all of its resources.
- Introduced new base models for training a model in Publisher.
- Introduced learning rate for model training in Publisher.
- Implemented role-based access control (RBAC) to Document AI on HAIC.
Improvements
- Expanded file support for Viewer document import to include JPG and ZIP files.
- Created a custom pipeline repository which includes fine-tuned BERT models and OCR-only pipelines.
- Added the ability to skip pages from scoring.
Fixes
- Removed support for Kubernetes v.< 1.23.
- Deleting now works on Project Level even if there are Job failures; this also deletes underlying Pipelines.
v0.6.2 (Aug 23, 2023)
New features
- Added the ability to update or add new templates.
Fixes
- Fixed an issue where the logic used to pull the feature store web proxy image did not work if the image registry is empty.
- Made an improvement to include
input_dir
content in the request going to the custom post-processor deployment.
Known issues
- After publishing a pipeline using a model trained on one file on H2O Document AI Viewer, the document will show zero results.
v0.6.1 (Jul 28, 2023)
Improvements
- Added support for Kubernetes 1.26.
- Updated telemetry implementation to make scored documents more efficiently retrieved.
Known issues
- Adding a file in edit in page view only supports image files.
- Adding a file in edit in page view does not update page and document number on the annotation sets page.
- After adding a file in edit in page view, leaving edit in page view, then returning to edit in page view, you cannot view said added file.
v0.6 (May 21, 2023)
New features
- Introduced H2O Document AI - Viewer for business users to score documents on built pipelines. [MVP]
- Introduced initial telemetry integration.
- Added the ability to score PDFs with page ranges.
Improvements
- Renamed original H2O Document AI to H2O Document AI - Publisher.
- Added a button to Published Pipelines to retrieve the scoring URL.
- Sped up CPU for EfficientNet OCR models by running with optimized OpenVINO.
- Sped up PDF scoring with JPG instead of PNG.
Fixes
- Fixed an issue where long names could not be used when publishing pipelines.
- Fixed an issue where zip files could not be uploaded from Windows machines.
- Fixed an issue where job status was still reported for cancelled jobs.
v0.5 (Apr 13, 2023)
New features
- Added optical character recognition (OCR) language support for:
- Latin (e.g. Spanish)
- Arabic (e.g. Persian)
- Added Document Text Recognition (DocTR) EfficientNet models to better recognize handwritten documents.
- Added ability to set batch size and number of epochs for model training.
- Added command-line bulk scorer to score a large number of documents (ships separate from main product).
Improvements
- Upgraded the ML API to v0.4.0.
- Refactored and improved the training user interface for better usability.
- Added the ability to gate access to H2O Document AI based on a user's role.
Feedback
- Submit and view feedback for this page
- Send feedback about H2O Document AI to cloud-feedback@h2o.ai