Skip to main content
Version: v1.7.3-10 🚧

Ingestion Methods

Ingestion methods are the different ways you can add documents to a collection in Enterprise h2oGPTe. Whether your content lives on your computer, on the web, in cloud storage, or in another collection, there's a method to bring it in.

This section explains each option and helps you pick the right one for your source.

Every ingestion method adds documents to the same collection and applies that collection's parsing and PII rules. They differ mainly in where the content comes from and how you authenticate to reach it: local uploads need no setup, while remote and enterprise connectors require credentials. Connectors such as S3, Azure Blob Storage, Google Cloud Storage, SharePoint Online, SharePoint On-Premise, and Confluence also support scheduled auto-sync.

Choosing a method​

Use this matrix to quickly match your source to the right ingestion method.

MethodBest forSource typeAdditional resource
Upload DocumentsQuickly adding a few local files by drag-and-drop or selectionLocal filesUpload Documents
Import from File SystemBulk imports from directories using glob patternsLocal or network directoriesImport from File System
Import from URLWeb pages, articles, and crawled documentation sitesWeb URLsImport from URL
Upload Plain TextNotes, transcripts, and code snippets entered or pasted directlyPasted textUpload Plain Text
Select a DocumentReusing a document from another collection, re-parsed with current rulesExisting documentSelect a Document
Select a CollectionMerging or copying every document from another collectionExisting collectionSelect a Collection
Import from S3Documents stored in Amazon S3 bucketsAmazon S3Import from S3
Import from Azure Blob StorageDocuments stored in Azure Blob Storage containersAzure Blob StorageImport from Azure Blob Storage
Import from Google Cloud StorageDocuments stored in Google Cloud Storage bucketsGoogle Cloud StorageImport from Google Cloud Storage
Import from SharePoint OnlineSharePoint Online sites and document librariesMicrosoft 365Import from SharePoint Online
Import from SharePoint On-PremiseOn-premises SharePoint installationsOn-premises SharePointImport from SharePoint On-Premise
Import from ConfluenceConfluence Cloud pages, spaces, and attachmentsAtlassian ConfluenceImport from Confluence

In this section​

Upload Documents
Upload files directly from your computer with drag-and-drop or file selection. The most straightforward way to add a few local documents.
Import from File System
Import documents from local or network directories using glob patterns. Ideal for bulk imports from organized folder structures.
Import from URL
Import content from web URLs with configurable crawling. Crawl websites and extract articles, documentation, and other online content.
Upload Plain Text
Create documents by entering or pasting text directly. Perfect for notes, transcripts, code snippets, and content that isn't a file.
Select a Document
Reuse a document from another collection. The original file is re-parsed with the current collection's parsing and PII rules.
Select a Collection
Import every document from another collection into your current one. Useful for merging, copying, or templating collections.
Import from S3
Import documents from Amazon S3 buckets with configurable credentials and paths. Supports IAM roles and auto-sync.
Import from Azure Blob Storage
Import documents from Azure Blob Storage containers using account keys, SAS tokens, connection strings, or Managed Identity, with auto-sync.
Import from Google Cloud Storage
Import documents from Google Cloud Storage buckets using service accounts, Workload Identity, or ADC, with auto-sync support.
Import from SharePoint Online
Import from SharePoint Online sites and document libraries through the Microsoft Graph API with modern authentication and auto-sync.
Import from SharePoint On-Premise
Import from on-premises SharePoint installations using legacy authentication methods such as Windows and claims-based auth.
Import from Confluence
Import pages, spaces, sub-pages, and embedded images from Atlassian Confluence Cloud using OAuth or API token authentication.

Next steps​


Feedback