Skip to main content

View a Curate project

Overview

After creating Curate projects through the H2O LLM DataStudio user interface, you can view a summary table with the following information about your current projects:

  • ID: The unique ID of the project.
  • Name: The name of the project.
  • Description: A detailed summary highlighting the project’s purpose and objectives.
  • Filename: The name of the uploaded document file.
  • Created: The date when the project was initially created.
  • Q:A pairs: The number of question-answer pairs generated from the uploaded document.
  • Number of documents: The number of documents uploaded to the project.
  • Number of pages: The number of total pages of documents in the project.
  • Status: The status of each project. It indicates the progress of the project (Example: Running, Completed).

view all projects

View a specific Curate project

The following steps describe how to view a specific Curate project in H2O LLM DataStudio.

  1. On the H2O LLM DataStudio left navigation menu, click Curate.
  2. To interact with a specific project, click on the project name.

Inside the Curate project you selected, you can view the following details.

curate project details

  • Status: The status of each project. It indicates the progress of the project (Example: Running, Complete).

  • Project details: Click to view project details.

    • The Project details tab shows the project name, project ID, project description, and the date and time when the project was initially created.
    • The FastQA mode indicates whether the smart chunking mode was activated, or not. The Chunk sampling shows the sampling ratio that has been used to convert documents into question-answer pairs.
  • Pairs: The number of dataset entries generated from the uploaded document.

  • Reload/Logs: Click to review the events and actions that occurred during the data curation process.

  • View reference: Select one or more entries from the table and click to view all the references for the selected entry.

  • Label:

    • Select the irrelevant entries from the table and click to mark them as irrelevant.
    • Select the relevant entries from the table and click to mark them as relevant.
  • Edit Q:A pairs: Select a dataset entry and click to edit the generated entries.

  • Input:

    • View document: Click to view the uploaded PDF documents.
  • Output:

    • Select the desired file type (JSON or CSV) from the dropdown menu. Click Execute to generate the question-answer pairs in the selected format.

      note

      You can download the generated curation pairs even if the project fails or terminates during the curation process.

    • Select Publish as Preparation Project from the dropdown menu to use the generated question-answer pairs in the Data preparation flow. Once you click Execute, the dataset generated from the Data curation process will be ingested into the Data preparation flow.

    • Select Publish as Custom Eval Project from the dropdown menu to use the generated question-answer pairs to create your own evaluation dataset. Once you click Execute, the dataset generated from the Data curation process will be ingested into the Custom Eval flow.

  • Export to H2O Drive: Click to export the project to H2O Drive.

  • Use the search bar to search for specific questions.

  • The table of question-answer pairs includes the following details:

    • Prompt: The question
    • Answer: The corresponding answer
    • Relevance: Indicates a similarity calculation between the context and the answer. Each question-answer pair has a relevance score assigned. The relevance score is calculated as the ratio of matching sequences between the created answer and the original context which the answer was generated from. A relevance score of 1 indicates that the answer is directly quoted from the context. A relevance score of 0 means the answer has no overlapping words with the original context. The relevance score helps in filtering out questions that are most relevant.
    • Filename: The name of the document from which the specific question-answer pair was generated. You can filter the question-answer pair based on the filename.
    • Flag: Indicates the question-answer pairs that have been identified as irrelevant. You can filter the question-answer pair based on the flag. question-answer pair

Feedback