Skip to main content
Version: v0.3.0

Demo datasets


In H2O Label Genie, you can use demo datasets to explore supported annotation tasks.

Access demo datasets

To access a demo dataset in H2O Label Genie, consider the following instructions:

  1. On the H2O Label Genie navigation menu, click Datasets.
  2. In the datasets table, select one of the demo datasets in H2O Label Genie.
  • After selecting a demo dataset, click New annotation task to annotate the dataset.
    • To learn how to annotate your dataset, see Tutorials

Demo datasets in H2O Label Genie

Amazon reviews demo

  • Dataset name: amazon-reviews-demo
  • Description: The dataset contains user reviews (in text format) and ratings (from 0 to 5) of Amazon products.
  • Dataset columns: stars, comment
  • Problem type: Text classification, text regression, text-entity recognition
  • License: CC0 1.0 Universal (CC0 1.0)

Car or coffee demo

  • Dataset name: car-or-coffee-demo
  • Description: The dataset contains images of cars and coffee.
  • Problem type: Image classification, object detection
  • License: Pexels license

Twitter demo

  • Dataset name: twitter-demo
  • Description: The dataset contains tweets that can be used to analyze tweet sentiments and recognize the emotion in text tweets.
  • Dataset columns: text, sentiment
  • Problem type: Text classification, text-entity recognition
  • License: Attribution 4.0 International (CC BY 4.0)

Text readability demo

  • Dataset name: text-readability-demo
  • Description: This dataset contains excerpts, and it is part of the CLEAR Corpus.
  • Dataset columns: id, excerpt
  • Problem type: Text regression, text-entity recognition
  • License: MIT license

CNN Daily Mail sample

  • Dataset name: cnn-dailymail-sample
  • Description: The dataset contains human-generated abstract summaries from news stories published on the CNN and Daily Mail websites.
  • Dataset columns: id, text, summary
  • Problem type: Text summarization, text classification
  • License: MIT license

Plant pathology demo

  • Dataset name: plant-pathology-demo
  • Description: This dataset contains images of healthy and diseased apple leaves for plant pathology recognition.
  • Problem type: Image classification, image regression, object detection
  • License: Attribution 4.0 International (CC BY 4.0)

ESC10 audio demo

  • Dataset name: esc10-audio-demo
  • Description: This dataset contains 5-second-long recordings of environmental sounds organized into ten classes (with 40 examples per class). Clips in this dataset have been manually extracted from public field recordings gathered by the project.
  • Problem type: Audio classification
  • License: Attribution 3.0 Unported (CC BY 3.0)

Amnist demo

  • Dataset name: amnist-demo
  • Description: The dataset contains a collection of 600 audio samples of spoken digits (0-9) of sixty different speakers.
  • Problem type: Audio regression
  • License: MIT license