Skip to main content

Create your own evaluation datasets

Overview

The Custom Eval feature in H2O LLM DataStudio enables you to create your own evaluation datasets from various document formats (such as PDFs and DOC files), from audio and video files and existing datasets. These evaluation datasets can be downloaded in JSON formats for each evaluation type, allowing for seamless integration with H2O Eval Studio.

note

Custom Eval only supports English language.

Instructions

To create your own evaluation dataset, consider the following instructions:

note

Before starting the process, you must integrate h2ogpte by providing the required credentials. You cannot create a new eval dataset without configuring h2ogpte. For more information, see Settings.

  1. On the H2O LLM DataStudio left navigation menu, click Custom Eval.

  2. On the Create Your Own Eval Datasets page, click New.

  3. On the Project name text box, enter a name for the project.

  4. On the Description text box, enter a description for the project.

  5. On the Dataset type drop-down menu, select the evaluation dataset type. The available dataset types are,

    • Question type:

      In the dataset, each entry includes a specific question, the correct answer to that question, and a label indicating the type of question. You can configure the question type distribution after uploading the document.
      question-type-dataset

      • Simple question: Simplifies the question and makes it easier to understand.
      • Conditional question: Increases question complexity by incorporating scenarios or conditions affecting the context.
      • Compress question: Makes the question more indirect and shorter while retaining its essence.
      • Multihop reasoning question: Increases question complexity by requiring multiple logical connections or inferences.
    • Multi-Choice: In this dataset, each entry consists of a question followed by multiple answer choices with one correct, or most appropriate answer and three incorrect choices.

    • Token presence: In this dataset, each entry includes a question, the correct answer, and a list of key tokens that are relevant to the answer.

  6. In the Do you already have a QA Dataset? drop-down menu, select Yes if you have a Q&A dataset in CSV format. Otherwise, click No.

    If you select No, H2O LLM DataStudio will generate Q&A pairs from the uploaded document for you.

  7. In the Ingestion pipeline selection section, toggle the Use h2oGPTe's ingestion pipeline option to choose between using the h2oGPTe's ingestion pipeline or the default LLM DataStudio pipeline.

    If this is turned on, it will initiate the following process:

    1. A new collection is created in h2oGPTe.
    2. The document is uploaded directly to this collection.
    3. h2oGPTe generates and returns content chunks from the uploaded document.

    If this is turned off, the ingestion pipeline of LLM DataStudio is used automatically.

  8. Click Next.

  9. If you selected Yes on the Do you already have a QA Dataset? drop-down menu, follow these steps:

    1. Click Browse to upload the Q&A dataset in CSV file format.
    2. Click Upload.
    3. In the LLM selection section, select your preferred H2OGPTE LLM from the available LLMs.
    4. In the Configure columns section, select the columns which contain the context, question, and answer from the given options.
    5. In the Question type dataset section, configure the question type distribution for the resulting evaluation dataset.
    6. Click Run pipeline.

    If you selected No on the Do you already have a QA Dataset? drop-down menu, follow these steps:

    1. Click Browse to upload the document or add the webpage URL if you are generating question-answer pairs from a webpage, or PDF web URL.
    2. Click Upload.
    3. In the LLM selection section, select your preferred H2OGPTE LLM from the available LLMs.
    4. Use the slider labeled Number of tokens per chunk to adjust the tokenization settings. This controls the maximum number of tokens per chunk of text processed by the model. The default is set to 1000 tokens.
    5. In the Question type dataset section, configure the question type distribution for the resulting evaluation dataset.
    6. Click Run pipeline.

Feedback