Skip to main content

Tutorial 1C: Create question-type and robust-type evaluation datasets

Overview

In this tutorial, you'll learn how to create a question-type evaluation dataset using H2O LLM DataStudio and then generate a robust evaluation dataset from it. By following the steps, you'll be able to configure and customize a dataset that tests various question types, including simple, conditional, compressed, and multihop reasoning questions. Additionally, you'll transform this dataset into a robust evaluation type, ensuring it can effectively challenge and validate your model's performance under diverse and complex conditions.

Prerequisites

Before starting this tutorial, make sure you have the following:

  1. Access to H2O LLM DataStudio via H2O AI Managed Cloud (HAMC).
  2. A pre-existing question-answer dataset. This tutorial makes use of a question-answer dataset based on Tweets. Download the Tweet QA CSV file to your machine.
  3. Familiarity with the data curation flow.
  4. Credentials for H2OGPTe integration.

Step 1: Create a new evaluation dataset

To begin the process of data curation, let's follow these steps to create a new evaluation dataset:

  1. On the H2O LLM DataStudio left navigation menu, click Custom Eval.
  2. On the Create Your Own Eval Datasets (Beta) page, click New.
    note

    If this is your first time creating a new evaluation dataset, you must integrate h2oGPTe by providing the required credentials. For more information, see Settings.

  3. In the Project name text box, enter tweets-eval-dataset.
  4. In the Description text box, enter evaluation dataset on tweets qa.
  5. From the Dataset type drop-down menu, select Question type as the evaluation dataset type.
  6. In the Do you already have a QA Dataset? dropdown menu, select Yes.
  7. Toggle the Use h2oGPTe's ingestion pipeline option. The default is enabled, so we will keep it as is. This option allows you to choose between using h2oGPTe's ingestion pipeline or the default LLM DataStudio pipeline.
  8. Click Next.

Step 2: Upload question answering (QA) dataset

Once you've configured the basic project settings, the next step is to upload your question answering (QA) dataset. Follow these steps:

  1. Click Browse to open the file selection dialog, and then locate and select the downloaded tweet_qa.csv file from your machine. Alternatively, you can drag and drop the file into the designated area.
  2. Click Upload.

Step 3: Configure settings

After uploading your document, configure the following settings. For this tutorial, let's keep the default settings as specified in each step:

  1. In the LLM selection section:
    1. Select your preferred H2OGPTe LLM.
    2. For this tutorial, we'll use the default option.
  2. In the Configure columns section:
    1. Select the context column of the dataset for context or background information relevant to the questions.
    2. Select the question column of the dataset that contains the questions.
    3. Select the answer column of the dataset that contains the answers.
  3. In the Question type dataset section, configure the distribution of each question type (simple, conditional, compressed, multihop). For this tutorial, we will use the default settings.

Step 4: Run pipeline

Now that you’ve configured all the necessary settings, it’s time to execute the pipeline and begin the eval dataset creation.

  1. Click Run pipeline.

Step 5: View the project

To view and interact with your new project:

  1. In the H2O LLM DataStudio left navigation menu, click Custom Eval.
  2. Select your project by clicking its name.

You will see a table of question-answer pairs, along with details such as question types, status, and the number of pairs. For a complete list of what you can view, see View an evaluation dataset.

View an evaluation dataset

Step 6: Generate a robust evaluation dataset

To generate a robust evaluation dataset from your newly created evaluation dataset:

  1. In the Output section, select Generate robust eval dataset from the dropdown menu.
  2. Choose one or more entries from the evaluation dataset.
  3. Click Execute.

Generate robust eval dataset

The robust evaluation dataset contains the new questions generated based on the original question, the answer, and the original question.

Robust eval dataset

Summary

In this tutorial, we learned how to create a question-type evaluation dataset using H2O LLM DataStudio and generate a robust evaluation dataset. We walked through the process of creating a new evaluation dataset, uploading a question-answer dataset, and configuring the necessary settings for different question types. Finally, we explored how to transform the dataset into a robust evaluation type to effectively test and validate your model's performance across a range of complex question types, ensuring a more comprehensive evaluation process.


Feedback