Skip to main content
Version: v1.6.1 🚧

Key terms

Enterprise h2oGPTe uses several key terms across its documentation, and each, in turn, is explained on this page.

Collection​

A Collection refers to a group of related Documents. A Collection lets you aggregate documents in one location. You can utilize Collections to group particular sets of material (content) to later explore individually through Chats (asking questions to a Collection). 

Enterprise h2oGPTe supports Retrieval Augmented Generation (RAG) when getting responses from an LLM, which allows for contextualizing the question to the LLM with information from documents, audio transcriptions, and other data. Users can create one or more data Collections from which they want to get answers or generate new content. When a user interacts with an LLM, the user’s prompt is compared with the Collection of documents to find similar chunks of information. This information is then sent to the LLM.

There are many strategies for importing and creating Collections so that you get the best responses for your use case. For more information, see Collections usage overview.

Job​

A Job signifies a single/batch of crawling or indexing tasks. In particular, the following tasks are referred to as a Job:

  • Ingest plain text
  • Ingest a Document from the file system
  • Ingest from cloud storage
  • Ingest (add) a Document from upload
  • Ingest (crawl) a website
  • Conver files to a PDF
  • Index Document(s)
  • Update a Collection's statistics
  • Delete a Document(s)
  • Delete a Document(s) from a Collection
  • Delete a Collection(s)
  • Import a stored Document to a Collection
  • Import all Document(s) from a Collection to another Collection
  • Summarize a Document
  • Process a Document(s)

Document​

A Document refers to one of your imported files to Enterprise h2oGPTe (for example, a PDF or web page).

Chat​

A Chat session is an interaction between you and Enterprise h2oGPTe that consists of a series of prompts and answers.

API Key​

An application programming interface (API) key is a unique identifier to authenticate to the h2oGPTe API.

Extractors​

Extractors, defined by JSON schemas, play a crucial role in document AI, transforming unstructured document content into structured, actionable data. With Extractors, you can effortlessly retrieve information from any document—whether it’s a CV, invoice, Form 10-K, or scanned image—without the need for complex setups or annotations. Just specify the data you need with an intuitive JSON schema builder (UI), upload your documents, and receive structured data instantly.

PII Detection​

Personally Identifiable Information (PII) detection is the process of recognizing and classifying sensitive data within a dataset that can be used to identify a specific individual. This includes information like social security numbers, credit card numbers, bank account numbers, and passport numbers. Non-sensitive PII includes information like names, addresses, and phone numbers.

Evaluators​

Evaluators are tools and metrics used to assess the performance and quality of large language models (LLMs) and Retrieval-Augmented Generation (RAG) models. They also evaluate a Collection's performance, reliability, security, fairness, and effectiveness.


Feedback