Key terms
Enterprise h2oGPTe uses several key terms across its documentation, and each, in turn, is explained on this page.
Collection
A Collection is a group of knowledge sources and configuration for RAG (Retrieval Augmented Generation) and custom agents. You can use Collections to group specific materials, which you can later explore individually through Chats by asking questions about a Collection.
Collections support RAG, which allows an LLM to use information from documents, audio transcriptions, and other data in a collection to generate more contextual responses. However, collections are not limited to RAG use cases. They can function as custom agents defined by system prompts, agent settings, and guardrails, with or without ingested documents. When RAG is enabled, information is retrieved from the collection’s knowledge sources and sent to the LLM as context, and when it is not, the collection can still operate as a fully configured agent focused on task execution or reasoning.
There are many strategies for importing and creating Collections so that you get the best responses for your use case. For more information, see Collections usage overview.
Job
A Job signifies a single/batch of crawling or indexing tasks. In particular, the following tasks are referred to as a Job:
- Ingest plain text
- Ingest a Document from the file system
- Ingest from cloud storage
- Ingest (add) a Document from upload
- Ingest (crawl) a website
- Conver files to a PDF
- Index Document(s)
- Update a Collection's statistics
- Delete a Document(s)
- Delete a Document(s) from a Collection
- Delete a Collection(s)
- Import a stored Document to a Collection
- Import all Document(s) from a Collection to another Collection
- Summarize a Document
- Process a Document(s)
Document
A Document refers to one of your imported files to Enterprise h2oGPTe (for example, a PDF or web page).
Chat
A Chat session is an interaction between you and Enterprise h2oGPTe that consists of a series of prompts and answers.
API Key
An application programming interface (API) key is a unique identifier to authenticate to the h2oGPTe API.
Extractors
Extractors, defined by JSON schemas, play a crucial role in document AI, transforming unstructured document content into structured, actionable data. With Extractors, you can effortlessly retrieve information from any document—whether it’s a CV, invoice, Form 10-K, or scanned image—without the need for complex setups or annotations. Just specify the data you need with an intuitive JSON schema builder (UI), upload your documents, and receive structured data instantly.
PII Detection
Personally Identifiable Information (PII) detection is the process of recognizing and classifying sensitive data within a dataset that can be used to identify a specific individual. This includes information like social security numbers, credit card numbers, bank account numbers, and passport numbers. Non-sensitive PII includes information like names, addresses, and phone numbers.
Evaluators
Evaluators are tools and metrics used to assess the performance and quality of large language models (LLMs) and Retrieval-Augmented Generation (RAG) models. They also evaluate a Collection's performance, reliability, security, fairness, and effectiveness.
- Submit and view feedback for this page
- Send feedback about Enterprise h2oGPTe to cloud-feedback@h2o.ai