Skip to main content

Datasets

Overview

The Datasets page lets you manage all datasets within your current project. From here, you can view, search, filter, and take actions on individual datasets — including launching experiments directly from the dataset view.

You can access Datasets from:

  • The top navigation tabs, or
  • The Datasets card on the homepage of a project

view-datasets

List datasets

When you open the Datasets page, you’ll see a table showing all datasets in the current project.

Table columns

Each row represents one dataset and includes:

  • Name
  • Number of rows
  • File size (e.g. KB, MB)
  • File type (e.g. CSV, Parquet)
  • Created date

You can:

  • Search by name
  • Search by file type
  • Customize columns using the column visibility toggle (top-right of the table) toggle-dataset-view

Edit dataset

Click the Edit button (top-left) to:

  • Select multiple datasets
  • Bulk delete selected datasets

edit-dataset

Use the Cancel button (bottom of the screen) to exit edit mode.

Row actions

Each dataset row includes a dropdown menu with:

  • Rename — opens a rename dialog
  • Delete — removes the dataset from the project

Add new dataset

Click the ➕ New Dataset button (top of the page) to upload your own data.

You’ll be taken to a new screen where you can:

  • Name your dataset (optional — a default name will be auto-generated if left blank)
  • Upload a file via drag-and-drop or file picker

Supported file formats

  • .csv — Comma-separated values
  • .parquet or .pq — Parquet format
  • .json — JSON structured data
info
  • Datasets should contain at least one column of text data — this platform is designed for fine-tuning large language models, so text input is expected.
  • If your dataset contains only input text (without labels), you can generate outputs using the Data Generation feature before fine-tuning.

Once uploaded, you’ll be returned to the Datasets page and your new dataset will appear in the list.

Use a demo dataset

If you don’t have a dataset handy, you can start with one of our pre-configured demo datasets.

On the New Dataset screen, scroll down to find:

“…or get started quickly with our demo datasets”

Clicking any of the demos will immediately import that dataset into your project.

demo-datasets

These demos cover a variety of tasks, including:

  • Topics for Data Generation
  • Text-to-SQL
  • Classification and other NLP use cases
Note

The specific demos available may change over time — they’re designed to help you quickly test and explore LLM fine-tuning in the platform.

After importing a demo, you’ll be redirected back to the Datasets list where you can explore and use it like any other dataset.

Dataset overview page

Clicking on a dataset opens its Overview page.

Header actions

At the top of the page:

  • Rename the dataset via the edit icon next to the name
  • Create Experiment to kick off a fine-tuning run using this dataset
  • Delete the dataset

Metadata summary

Right below the header, you’ll see metadata about the dataset:

  • File size
  • Row and column count
  • Token count (total tokens in the dataset)
  • Created date
  • Number of experiments launched using this dataset

Column details

Scroll down to view the Column Details section:

  • Each column is listed with:
    • Column name
    • Max token length for that column (useful for prompt planning)
  • You can collapse/expand this section as needed

Dataset sample

Below the column details is the Dataset Sample table:

  • Shows up to 20 preview rows
  • You can expand individual rows to view full text if it’s truncated
  • Especially helpful for verifying formatting and text content in large-language tasks

Feedback