Import a dataset
Overview
Before you can import your dataset to H2O Label Genie, the dataset needs to meet the following requirements:
- The dataset data type needs to be text, image, or audio.
- Dataset format:
- The dataset (data) for an image or audio annotation task must be in a zip file containing the images or audios. Note
- You can have any nested folder structure inside the zip file.
- All images need to have an image extension. Images can contain a mix of supported image extensions. To learn about supported image extensions, see Supported image extensions.
- All audios need to have an audio extension. audios can contain a mix of supported audio extensions. To learn about supported audio extensions, see Supported audio extensions.
- The dataset (data) for a text annotation task must be in a CSV file.
- One column needs to hold the text data.
- The dataset (data) for an image or audio annotation task must be in a zip file containing the images or audios.
- To learn how to annotate your dataset, see Annotate a dataset.
- To learn how to download an annotated dataset, see Download an annotated dataset.
- To learn how to import an already annotated dataset to H2O Label Genie to review further or edit, see Import an annotated dataset.
Instructions
To import your dataset (data) to H2O Label Genie, consider the following instructions:
On the H2O Label Genie navigation menu, click Datasets.
Click Import data.
In the Name box, enter a name for the dataset.
(Optional) In the Description box, enter a description for the dataset.
For Data type, choose an option.
- If the data type of the dataset you are importing is text: Select Text.
- If the data type of the dataset you are importing is image: Select Image.
- If the data type of the dataset you are importing is audio: Select Audio.
In the Source list, select the source (data connector) that you want to use (for example, S3).
- Upload
- S3
- H2O Drive
- Click Browse....
Or drag and drop the file (dataset)- In the S3 bucket name box, enter the name of the S3 bucket name.
- In the AWS access key box, enter the AWS access key.
- In the AWS access key box, enter the AWS access key.info
You don't need to enter the AWS access key if the S3 bucket is public.
- In the AWS secret key box, enter the AWS secret key.info
You don't need to enter the AWS secret key if the S3 bucket is public.
- In the File name list, select the file you want to use.
- In the File name list, select the file you want to use.
Click Import.
Supported image extensions
The following is a list of supported image extensions for image annotation tasks in H2O Label Genie:
- Windows bitmaps:
.bmp
- JPEG files:
.jpeg
,.jpg
,.jpe
- JPEG 2000 files:
.jp2
- Portable Network Graphics:
.png
- WebP:
.webp
- Portable image format:
.pbm
,.pgm
,.ppm
,.pnm
- TIFF files:
.tiff
.tif
- Radiance HDR:
.hdr
Supported audio extensions
The following is a list of supported audio extensions for audio annotation tasks in H2O Label Genie:
- Uncompressed:
.wav
,.aiff
- Lossless compressed:
.flac
- Lossy compressed:
.mp3
,.ogg
- Submit and view feedback for this page
- Send feedback about H2O Label Genie to cloud-feedback@h2o.ai