Skip to main content
Version: v0.3.0

Explore a dataset


H2O Label Genie enables you to explore an image or text dataset through a clustering task. A clustering task refers to finding and exploring groups in a dataset.


To learn about supported clustering tasks, see Supported clustering tasks.


To explore a dataset through a clustering task, consider the following instructions:

  1. On the H2O Label Genie navigation menu, click Data exploration.
  2. Click New clustering task.
  3. In the Task name box, enter a name for the clustering task.
  4. In the Task description box, enter a description for the clustering task.
  5. In the Select dataset list, select an image or text dataset (the dataset you want to explore).
    • If the data type of the selected dataset is text, proceed with the following instructions:
      1. In the Select text column list, select the text column in the dataset (data).
  6. In the Number of clusters box, enter the number of clusters to be used by the clustering algorithm.
  7. In the Type list, select a clustering algorithm for the clustering task.

    H2O Label Genie supports Gaussian mixture and K-means clustering for image and text datasets. The clustering is performed on the data embeddings generated with the OpenCLIP learning model. OpenCLIP is an adaptation of OpenAI's Contrastive Language-Image Pre-training (CLIP). To learn more about OpenCLIP, see OpenCLIP.

  8. Click Start clustering.