Model flow
The flow of an H2O Document AI model from creation to deployment and consumption can be summarized in the following sequential steps (discussed in the below sections):
- Step 1: Ingest
- Step 2: Pre-process
- Step 3: Labeling
- Step 4: Train models
- Step 5: Post-process
- Step 6: Deploy models
- Step 7: Consume
Step 1: Ingest
Upload your documents to H2O Document AI using the Document AI web interface or API. H2O Document AI lets you handle a wide variety of documents, including:
- Image scans (faxes in PDF or other formats, pictures with text, and non-editable forms)
- Documents with embedded text which have text and layout metadata (PDF docs, Word docs, HTML pages)
- Documents with regular text “left to right/top to bottom” (CSVs, emails, editable forms)
Step 2: Pre-process
Pre-process documents before training with a set of state-of-the-art computer vision and NLP product features. Pre-processing includes support for:
- Recognizing and handling embedded text
- Recognizing and handling logos
- Page orientation resolution
- Deskewing
- Cropping
- Text formatting optimization
- Color binarization
- Addressing input PDF quality challenges
Step 3: Labeling
Add, improve, and validate document labels:
- Integrates with common label formats
- Provides advanced options for validating labels against scored documents and determining labeling sufficiency
Step 4: Train models
Select the training data set within H2O Document AI, and it will automatically learn the document and create models.
- Language understanding and layout recognition using learning based on deep learning, transformer architectures, and machine learning
- AI-ML engine that uses multiple computer vision and NLP algorithms for diverse AI tasks
- Entity recognition
- Document and page classification
- Form understanding
- Grouping & set identification
Step 5: Post-process
Post-process to ensure consistency, accuracy, and organization of scored documents. H2O Document AI lets you perform a range of customized post-processing jobs that use AI algorithms vs. rules to ensure high-quality predictions and insights.
- Organizing prediction sets
- Confidence and probability measures
- Datatype standardization – date, times, currency codes, international numerical formats, locations
Step 6: Deploy models
Publish models into your cloud or on-premises environment of choice. Integrate models into existing systems, processes, and applications via APIs or JSON documents.
Step 7: Consume
After deploying your models, you can:
- Consume the model through business apps
- Store data
- Batch score in real-time to obtain predictions
- Submit and view feedback for this page
- Send feedback about H2O Document AI to cloud-feedback@h2o.ai