Datasets

Overview

The Datasets page lets you manage all datasets within your current project. From here, you can view, search, filter, and take actions on individual datasets — including launching experiments directly from the dataset view.

You can access Datasets from:

The top navigation tabs, or
The Datasets card on the homepage of a project

view-datasets

List datasets

When you open the Datasets page, you’ll see a table showing all datasets in the current project.

Table columns

Each row represents one dataset and includes:

Name
Number of rows
File size (e.g. KB, MB)
File type (e.g. CSV, Parquet)
Created date

You can:

Search by name
Search by file type
Customize columns using the column visibility toggle (top-right of the table)

Edit dataset

Click the Edit button (top-left) to:

Select multiple datasets
Bulk delete selected datasets

edit-dataset

Use the Cancel button (bottom of the screen) to exit edit mode.

Row actions

Each dataset row includes a dropdown menu with:

Rename — opens a rename dialog
Delete — removes the dataset from the project

Add new dataset

Click the ➕ New Dataset button (top of the page) to upload your own data.

You’ll be taken to a new screen where you can:

Name your dataset (optional — a default name will be auto-generated if left blank)
Upload a file via drag-and-drop or file picker

Supported file formats

.csv — Comma-separated values
.parquet or .pq — Parquet format
.json — JSON structured data

info

Datasets should contain at least one column of text data — this platform is designed for fine-tuning large language models, so text input is expected.
If your dataset contains only input text (without labels), you can generate outputs using the Data Generation feature before fine-tuning.

Once uploaded, you’ll be returned to the Datasets page and your new dataset will appear in the list.

Use a demo dataset

If you don’t have a dataset handy, you can start with one of our pre-configured demo datasets.

On the New Dataset screen, scroll down to find:

“…or get started quickly with our demo datasets”

Clicking any of the demos will immediately import that dataset into your project.

demo-datasets

These demos cover a variety of tasks, including:

Topics for Data Generation
Text-to-SQL
Classification and other NLP use cases

Note

The specific demos available may change over time — they’re designed to help you quickly test and explore LLM fine-tuning in the platform.

After importing a demo, you’ll be redirected back to the Datasets list where you can explore and use it like any other dataset.

Dataset overview page

Clicking on a dataset opens its Overview page.

Header actions

At the top of the page:

Rename the dataset via the edit icon next to the name
Create Experiment to kick off a fine-tuning run using this dataset
Delete the dataset

Metadata summary

Right below the header, you’ll see metadata about the dataset:

File size
Row and column count
Token count (total tokens in the dataset)
Created date
Number of experiments launched using this dataset

Column details

Scroll down to view the Column Details section:

Each column is listed with:
- Column name
- Max token length for that column (useful for prompt planning)
You can collapse/expand this section as needed

Dataset sample

Below the column details is the Dataset Sample table:

Shows up to 20 preview rows
You can expand individual rows to view full text if it’s truncated
Especially helpful for verifying formatting and text content in large-language tasks

Feedback

Submit and view feedback for this page
Send feedback about H2O Enterprise LLM Studio to cloud-feedback@h2o.ai

Overview​

List datasets​

Table columns​

Edit dataset​

Row actions​

Add new dataset​

Supported file formats​

Use a demo dataset​

Dataset overview page​

Header actions​

Metadata summary​

Column details​

Dataset sample​