Skip to main content
Version: Next

Collections usage overview

Overview

In Enterprise h2oGPTe, Collections are groups of related documents.

Enterprise h2oGPTe supports Retrieval Augmented Generation (RAG) when getting responses from an LLM, allowing for the contextualization of questions with information from documents, audio transcriptions, and other data. You can create one or more collections of data to get answers about or generate new content from. When you interact with an LLM, your prompt is compared with the collection of documents to find similar chunks of information. This information is then sent to the LLM.

There are many strategies for importing and creating collections to get the best responses for your use case. This page describes common use cases and how to set up your data effectively.

info

For more information on supported file types for collections, see Supported file types for a Collection.

Types of questions

At a high level, there are generally three types of questions. To describe these types, we will use an example where we have menus from different restaurants. The following is a list of three types of questions that are typically asked. An example is provided for each type of question.

  1. Single Document Questions: Questions that only use context from a specific document to be answered.

    • Example: "What chicken dishes are served at Restaurant: Mesita?"
    • In this case, you only want the LLM to use the menu document from Restaurant: Mesita. Information from any other menu should not be used.
  2. Aggregating Questions: Questions that aggregate information across multiple documents.

    • Example: "Based on the menus, what are some examples of healthy chicken dishes?"
    • In this case, Enterprise h2oGPTe must combine information from the various menus into a single response. It is not comparing the menus but rather finding examples of healthy chicken dishes across them.
  3. Compare/Contrast Questions: Questions that compare or contrast information from multiple documents.

    • Example: "What is the cost of a steak at each of the restaurants?
    • In this case, Enterprise h2oGPTe needs to determine the cost of steak from each menu document and compare the prices in the response.

The following is the recommended guidance based on the question you have:

Type of QuestionWhat to DoNotes
Single Document QuestionCreate a collection with the single document you want to use to answer the question. It is not recommended to ask the question in a collection with multiple documents, since RAG may use chunks from documents that the user does not consider to be relevant.If you have your document already loaded in a collection with multiple documents, you do not need to re-upload your document into a new collection. Instead, you can add your document to a new collection. This prevents the document from being duplicated in the system.
Aggregating QuestionCreate a collection with the documents you want the LLM to use to create its answer. Using RAG+ is recommended, as this provides more context to the LLM.For collections with more documents, consider increasing the number of neighboring chunks in RAG+ to 1 or 2. This increases the context that is passed to the LLM. For more information, see Additional note context.
Compare/Contrast QuestionTo compare or contrast all documents in the collection, you have two options. You can follow the steps for Aggregating Questions listed above—however, it is not guaranteed to pass chunks from each document to the LLM. If you must have information from each document in the response, you can instead ask the question to each document separately (document in its own collection) and collect the responses.N/A
Aggregating Question: Additional note context

The following example shows how to increase the number of neighboring chunks in RAG+ to 1 or 2.

with client.connect(chat_session_id) as session:
reply = session.query(
'Based on the menus, what are some examples of healthy chicken dishes?',
timeout=60,
rag_config={"rag_type": "rag+", "num_neighbor_chunks_to_include": 2},
)

Further reading

To learn more about Collections, refer to the following pages:


Feedback