View a Curate project
Overview
After creating Curate projects through the H2O LLM DataStudio user interface, you can view a summary table with the following information about your current projects:
- ID: The unique ID of the project.
- Name: The name of the project.
- Description: A detailed summary highlighting the project’s purpose and objectives.
- Filename: The name of the uploaded document file.
- Created: The date when the project was initially created.
- Q:A pairs: The number of question-answer pairs generated from the uploaded document.
- Number of documents: The number of documents uploaded to the project.
- Number of pages: The number of total pages of documents in the project.
- Status: The status of each project. It indicates the progress of the project (Example:
Running
,Completed
).
View a specific Curate project
The following steps describe how to view a specific Curate project in H2O LLM DataStudio.
- On the H2O LLM DataStudio left navigation menu, click Curate.
- To interact with a specific project, click on the project name.
Inside the Curate project you selected, you can view the following details.
-
Status: The status of each project. It indicates the progress of the project (Example:
Running
,Complete
). -
Project details: Click
to view project details.- The Project details tab shows the project name, project ID, project description, and the date and time when the project was initially created.
- The FastQA mode indicates whether the smart chunking mode was activated, or not. The Chunk sampling shows the sampling ratio that has been used to convert documents into question-answer pairs.
-
Pairs: The number of dataset entries generated from the uploaded document.
-
Reload/Logs: Click
to review the events and actions that occurred during the data curation process. -
View reference: Select one or more entries from the table and click
to view all the references for the selected entry. -
Label:
- Select the irrelevant entries from the table and click to mark them as irrelevant.
- Select the relevant entries from the table and click to mark them as relevant.
-
Edit Q:A pairs: Select a dataset entry and click
to edit the generated entries. -
Input:
- View document: Click to view the uploaded PDF documents.
-
Output:
- Select the desired file type (JSON or CSV) from the dropdown menu. Click Execute to generate the question-answer pairs in the selected format.
noteYou can download the generated curation pairs even if the project fails or terminates during the curation process.
- Select Publish as Preparation Project from the dropdown menu to use the generated question-answer pairs in the Data preparation flow. Once you click Execute, the dataset generated from the Data curation process will be ingested into the Data preparation flow.
- Select Publish as Custom Eval Project from the dropdown menu to use the generated question-answer pairs to create your own evaluation dataset. Once you click Execute, the dataset generated from the Data curation process will be ingested into the Custom Eval flow.
-
Export to H2O Drive: Click
to export the project to H2O Drive. -
Use the search bar to search for specific questions.
-
The table of question-answer pairs includes the following details:
- Prompt: The question
- Answer: The corresponding answer
- Relevance: Indicates a similarity calculation between the context and the answer. Each question-answer pair has a relevance score assigned. The relevance score is calculated as the ratio of matching sequences between the created answer and the original context which the answer was generated from. A relevance score of 1 indicates that the answer is directly quoted from the context. A relevance score of 0 means the answer has no overlapping words with the original context. The relevance score helps in filtering out questions that are most relevant.
- Filename: The name of the document from which the specific question-answer pair was generated. You can filter the question-answer pair based on the filename.
- Flag: Indicates the question-answer pairs that have been identified as irrelevant. You can filter the question-answer pair based on the flag.
- Submit and view feedback for this page
- Send feedback about H2O LLM DataStudio | Docs to cloud-feedback@h2o.ai