Skip to main content

Data ingestion

Overview

Data preparation involves a series of steps to transform and clean your dataset. The first step is data ingestion.

Instructions

To ingest data, consider the following instructions:

  1. On the H2O LLM DataStudio left navigation menu, click Projects.
  2. Click the name of the project that you created before.
  3. On the H2O LLM DataStudio left navigation menu inside the project, Click Ingestion.
  4. The Ingestion tab will appear. Visit the Add new dataset section.
  5. In the Dataset name text box, enter the name of the dataset.
  6. In the Description text box, enter a description for the dataset.
  7. Click Browse in the Upload file section to select and upload the dataset file in the supported file format.
    note
    • You can upload a single file, multiple files, or multiple files compressed in a zip file.
    • The supported file formats are csv, json, jsonl and parquet.
    • The size of the file should be less than 1GB.
    • There are required columns for each workflow type. For more information, see Supported problem types.
    • To find sample datasets, see Example datasets in the H2O LLM DataStudio GitHub repository.
  8. Click Ingest to ingest the dataset into the data preparation flow.
    Ingest data
  9. Once you add the dataset to the project, you can check the available datasets from the right side of the page. Select the dataset and click Dataset preview to preview the dataset.
  10. Under Configure columns, select the relevant columns for question, answer, and context from the given options.
  11. Click Save.
    Ingest data

Feedback