Skip to main content
Version: v0.3.0

Demo datasets

Overview

In H2O Label Genie, you can use demo datasets to explore supported annotation tasks.

Access demo datasets

To access a demo dataset in H2O Label Genie, consider the following instructions:

  1. On the H2O Label Genie navigation menu, click Datasets.
  2. In the datasets table, select one of the demo datasets in H2O Label Genie.
Note
  • After selecting a demo dataset, click New annotation task to annotate the dataset.
    • To learn how to annotate your dataset, see Tutorials

Demo datasets in H2O Label Genie

Amazon reviews demo

  • Dataset name: amazon-reviews-demo
  • Description: The dataset contains user reviews (in text format) and ratings (from 0 to 5) of Amazon products.
  • Dataset columns: stars, comment
  • Problem type: Text classification, text regression, text-entity recognition
  • License: CC0 1.0 Universal (CC0 1.0)

Car or coffee demo

  • Dataset name: car-or-coffee-demo
  • Description: The dataset contains images of cars and coffee.
  • Problem type: Image classification, object detection
  • License: Pexels license

Twitter demo

  • Dataset name: twitter-demo
  • Description: The dataset contains tweets that can be used to analyze tweet sentiments and recognize the emotion in text tweets.
  • Dataset columns: text, sentiment
  • Problem type: Text classification, text-entity recognition
  • License: Attribution 4.0 International (CC BY 4.0)

Text readability demo

  • Dataset name: text-readability-demo
  • Description: This dataset contains excerpts, and it is part of the CLEAR Corpus.
  • Dataset columns: id, excerpt
  • Problem type: Text regression, text-entity recognition
  • License: MIT license

CNN Daily Mail sample

  • Dataset name: cnn-dailymail-sample
  • Description: The dataset contains human-generated abstract summaries from news stories published on the CNN and Daily Mail websites.
  • Dataset columns: id, text, summary
  • Problem type: Text summarization, text classification
  • License: MIT license

Plant pathology demo

  • Dataset name: plant-pathology-demo
  • Description: This dataset contains images of healthy and diseased apple leaves for plant pathology recognition.
  • Problem type: Image classification, image regression, object detection
  • License: Attribution 4.0 International (CC BY 4.0)

ESC10 audio demo

  • Dataset name: esc10-audio-demo
  • Description: This dataset contains 5-second-long recordings of environmental sounds organized into ten classes (with 40 examples per class). Clips in this dataset have been manually extracted from public field recordings gathered by the Freesound.org project.
  • Problem type: Audio classification
  • License: Attribution 3.0 Unported (CC BY 3.0)

Amnist demo

  • Dataset name: amnist-demo
  • Description: The dataset contains a collection of 600 audio samples of spoken digits (0-9) of sixty different speakers.
  • Problem type: Audio regression
  • License: MIT license

Feedback