Version: v1.5.12

Create a Collection

Overview

To create a Collection, you only need to specify the following required setting: Collection name.

note

There are many strategies for importing and creating Collections so that you get the best responses for your use case. For guidance on how to use Collections, see Collections usage overview.

Instructions

The following steps describe how to create a Collection.

caution

You can select an embedding model for the Collection only once and that is during the process of creating a new Collection. In other words, you can utilize the default selected embedding model or change it to one of the available options. You can not change this setting after it is defined during the creation process of the Collection. Embedding model options

On the Enterprise h2oGPTe navigation menu, click Collections.
Click + New collection.
In the Collection name box, enter a name for the Collection.
Click + Create.

note

You can modify/define the other Collection settings when creating the Collection or after its creation. For example, you can add documents to the Collection during or after its creation.
To learn about each of the Collection settings, see Collection settings.

Collection settings

The Collection settings section includes the following settings:

General

Collection name

This setting defines the name of the Collection.

Description

This setting defines the description of the Collection.

note

If the Description box is left empty, the system will auto-generate a description based on the uploaded documents, configurable prompts, and the number of chunks of the Collection.

Configuration

Embedding model

This setting defines the embedding model for the Collection. You can select an embedding model only once when creating a new Collection. In other words, you can utilize the default selected embedding model or change it to one of the available options.

caution

You can not change this setting after it is defined during the creation process of the Collection.

Number of tokens per chunk

This setting defines the desired target size of document context chunks in a number of tokens. Larger values improve the retrieval of large, contiguous pieces of information, while smaller values improve the retrieval of fine-grained details. Text extracted from large images will generally stay together in one chunk, no matter the value of this setting.

Chunk overlap tokens

This setting defines (or controls) the number of overlapping tokens between consecutive document context chunks. Increasing this value results in greater overlap, providing more context for challenging questions and leading to more duplicated data. The default (and recommended) value of 0 ensures that chunks have no overlapping tokens.

Guardrails and PII Detection

This setting establishes guardrails for prompts and the detection and redaction of personally identifiable information (PII). Options:

None

This option does not apply guardrails for prompts and the detection and redaction of PII. In other words, Enterprise h2oGPTe does not redact PII when it is detected in the document during ingestion, input to the LLM, or output from the LLM.
Enable guardrails, allow PII

This option enables guardrails for prompts but does not address PII. In other words, Enterprise h2oGPTe does not redact PII when it is detected in the document during ingestion, input to the LLM, or output from the LLM.
Enable guardrails, redact sensitive PII

This option enables guardrails for prompts and the detection and redaction of sensitive PII. In other words, Enterprise h2oGPTe redacts sensitive PII when it is detected in the document during ingestion, input to the LLM, or output from the LLM.

Sensitive: PII is data that, if improperly disclosed or accessed, could potentially lead to substantial harm for an individual. Due to its sensitive nature, this data is highly susceptible to misuse, such as identity theft, fraud, or discrimination. For example, Social Security Numbers (SSNs).
Enable guardrails, redact any PII

This option enables guardrails for prompts and the detection and redaction of any PII. In other words, Enterprise h2oGPTe redacts any PII when it is detected in the document during ingestion, input to the LLM, or output from the LLM.

Any: Any PII refers to any information that can be used to identify an individual, either directly or indirectly. It includes both sensitive and non-sensitive information. For example, email addresses or Social Security Numbers (SSNs).
Customize guardrails and PII settings

This option enables you to view/edit all guardrails for prompts and the detection and redaction of PII.

Prompt guard

This setting specifies the entities that Enterprise h2oGPTe should identify in all user prompts, including prompt templates and queries. The Prompt Guard model determines the available options for this setting. If a prompt template triggers a JAILBREAK detection, adjust it as necessary. Jailbreaks are harmful instructions intended to bypass the safety and security mechanisms of the model.

note