Skip to main content

View an evaluation dataset

Overview

After creating your own evaluation dataset through the H2O LLM DataStudio user interface, you can view a summary table with the following information about your current evaluation datasets:

  • ID: The unique ID of the custom Eval project.
  • Name: The name of the custom Eval project.
  • Description: A detailed summary highlighting the custom Eval project’s purpose and objectives.
  • Filename: The name of the uploaded document file.
  • Eval dataset type: The type of the evaluation dataset (question type, multi-choice, token presence).
  • Created: The date when the project was initially created.
  • Number of entries: The number of entries (rows) on the dataset.
  • Status: The status of each project. It indicates the progress of the project (Example: Running, Complete).

View a specific Custom Eval project

The following steps describe how to view a specific Custom Eval project in H2O LLM DataStudio.

  1. On the H2O LLM DataStudio left navigation menu, click Custom Eval.
  2. To interact with a specific project, click on the project name.

Inside the Custom Eval project you selected, you can view the following details.

  • Status: The status of each project. It indicates the progress of the project (Example: Running, Complete).
  • Pairs: The number of question-answer pairs generated in the evaluation dataset.
  • Reload/logs: Click to review the events and actions that occurred during the process of creating an evaluation dataset.
  • Label:
    • Select the question-answer pairs from the table and click to mark them as irrelevant.
    • Select the question-answer pairs from the table and click to mark them as relevant.
  • Edit Q:A pairs: Select a question-answer pair and click to edit the dataset entries. Click Update records to update the dataset.
  • Input:
    • View document: Click to view the uploaded PDF documents.
  • Output:
    • Click to download the curated question-answer pairs in JSON or CSV file formats.
  • Generate robust eval dataset: Select one or more rows from the generated question-answer pairs and click to generate a robust evaluation dataset. The robust evaluation dataset contains the new questions generated based on the original question, the answer, and the original question. robust eval dataset

Feedback