Datasets
Overview
The Datasets page lets you manage all datasets within your current project. From here, you can view, search, filter, and take actions on individual datasets — including launching experiments directly from the dataset view.
You can access Datasets from:
- The top navigation tabs, or
- The Datasets card on the homepage of a project
List datasets
When you open the Datasets page, you’ll see a table showing all datasets in the current project.
Table columns
Each row represents one dataset and includes:
- Name
- Number of rows
- File size (e.g. KB, MB)
- File type (e.g. CSV, Parquet)
- Created date
You can:
- Search by name
- Search by file type
- Customize columns using the column visibility toggle (top-right of the table)
Edit dataset
Click the Edit button (top-left) to:
- Select multiple datasets
- Bulk delete selected datasets
Use the Cancel button (bottom of the screen) to exit edit mode.
Row actions
Each dataset row includes a dropdown menu with:
- Rename — opens a rename dialog
- Delete — removes the dataset from the project
Add new dataset
Click the ➕ New Dataset button (top of the page) to upload your own data.
You’ll be taken to a new screen where you can:
- Name your dataset (optional — a default name will be auto-generated if left blank)
- Upload a file via drag-and-drop or file picker
Supported file formats
.csv
— Comma-separated values.parquet
or.pq
— Parquet format.json
— JSON structured data
- Datasets should contain at least one column of text data — this platform is designed for fine-tuning large language models, so text input is expected.
- If your dataset contains only input text (without labels), you can generate outputs using the Data Generation feature before fine-tuning.
Once uploaded, you’ll be returned to the Datasets page and your new dataset will appear in the list.
Use a demo dataset
If you don’t have a dataset handy, you can start with one of our pre-configured demo datasets.
On the New Dataset screen, scroll down to find:
“…or get started quickly with our demo datasets”
Clicking any of the demos will immediately import that dataset into your project.
These demos cover a variety of tasks, including:
- Topics for Data Generation
- Text-to-SQL
- Classification and other NLP use cases
The specific demos available may change over time — they’re designed to help you quickly test and explore LLM fine-tuning in the platform.
After importing a demo, you’ll be redirected back to the Datasets list where you can explore and use it like any other dataset.
Dataset overview page
Clicking on a dataset opens its Overview page.
Header actions
At the top of the page:
- Rename the dataset via the edit icon next to the name
- Create Experiment to kick off a fine-tuning run using this dataset
- Delete the dataset
Metadata summary
Right below the header, you’ll see metadata about the dataset:
- File size
- Row and column count
- Token count (total tokens in the dataset)
- Created date
- Number of experiments launched using this dataset
Column details
Scroll down to view the Column Details section:
- Each column is listed with:
- Column name
- Max token length for that column (useful for prompt planning)
- You can collapse/expand this section as needed
Dataset sample
Below the column details is the Dataset Sample table:
- Shows up to 20 preview rows
- You can expand individual rows to view full text if it’s truncated
- Especially helpful for verifying formatting and text content in large-language tasks
- Submit and view feedback for this page
- Send feedback about H2O Enterprise LLM Studio to cloud-feedback@h2o.ai