Release notes
v0.7.2 (Mar 14, 2024)
Fixes
- Fixed a memory leak to stop out-of-memory pod failure after a certain amount of scored documents.
- Fixed issue with page population.
v0.7.1 (Feb 12, 2023)
New features
- Implemented RBAC for publishing pipelines.
Improvements
- Added compatibility with pipelines published in previous versions of Document AI (v0.6 and v0.5).
- Made Viewer work with pipelines that have custom V4 post-processor.
Fixes
- Fixed archive extracting.
- Fixed deletion of DocumentSet when corresponding AnnotationSet was edited.
v0.7.0 (Nov 5, 2023)
New features
- Introduced the universal scoring pipeline.
- Introduced ability to automatically purge training artifacts.
- Introduced ability to schedule the deletion of your whole project and all of its resources.
- Introduced new base models for training a model in Publisher.
- Introduced learning rate for model training in Publisher.
- Implemented role-based access control (RBAC) to Document AI on HAIC.
Improvements
- Expanded file support for Viewer document import to include JPG and ZIP files.
- Created a custom pipeline repository which includes fine-tuned BERT models and OCR-only pipelines.
- Added the ability to skip pages from scoring.
Fixes
- Removed support for Kubernetes v.< 1.23.
- Deleting now works on Project Level even if there are Job failures; this also deletes underlying Pipelines.
v0.6.2 (Aug 23, 2023)
New features
- Added the ability to update or add new templates.
Fixes
- Fixed an issue where the logic used to pull the feature store web proxy image did not work if the image registry is empty.
- Made an improvement to include
input_dir
content in the request going to the custom post-processor deployment.
Known issues
- Deploying an H2O Document AI pipeline with a long name fails.
- After publishing a pipeline using a model trained on one file on H2O Document AI Viewer, the document will show zero results.
v0.6.1 (Jul 28, 2023)
Improvements
- Added support for Kubernetes 1.26.
- Updated telemetry implementation to make scored documents more efficiently retrieved.
Known issues
- Adding a file in edit in page view only supports image files.
- Adding a file in edit in page view does not update page and document number on the annotation sets page.
- After adding a file in edit in page view, leaving edit in page view, then returning to edit in page view, you cannot view said added file.
v0.6 (May 21, 2023)
New features
- Introduced H2O Document AI - Viewer for business users to score documents on built pipelines. [MVP]
- Introduced initial telemetry integration.
- Added the ability to score PDFs with page ranges.
Improvements
- Renamed original H2O Document AI to H2O Document AI - Publisher.
- Added a button to Published Pipelines to retrieve the scoring URL.
- Sped up CPU for EfficientNet OCR models by running with optimized OpenVINO.
- Sped up PDF scoring with JPG instead of PNG.
Fixes
- Fixed an issue where long names could not be used when publishing pipelines.
- Fixed an issue where zip files could not be uploaded from Windows machines.
- Fixed an issue where job status was still reported for cancelled jobs.
v0.5 (Apr 13, 2023)
New features
- Added optical character recognition (OCR) language support for:
- Latin (e.g. Spanish)
- Arabic (e.g. Persian)
- Added Document Text Recognition (DocTR) EfficientNet models to better recognize handwritten documents.
- Added ability to set batch size and number of epochs for model training.
- Added command-line bulk scorer to score a large number of documents (ships separate from main product).
Improvements
- Upgraded the ML API to v0.4.0.
- Refactored and improved the training user interface for better usability.
- Added the ability to gate access to H2O Document AI based on a user's role.
Feedback
- Submit and view feedback for this page
- Send feedback about H2O Document AI to cloud-feedback@h2o.ai