Import a dataset
Overview
Before you can import your dataset to H2O Label Genie, the dataset needs to meet the following requirements:
- The dataset data type needs to be text, image, or audio.
- Dataset format:
- The dataset (data) for an image or audio annotation task must be in a zip file containing the images or audios. Note
- You can have any nested folder structure inside the zip file.
- All images need to have an image extension. Images can contain a mix of supported image extensions. To learn about supported image extensions, see Supported image extensions.
- All audios need to have an audio extension. audios can contain a mix of supported audio extensions. To learn about supported audio extensions, see Supported audio extensions.
- The dataset (data) for a text annotation task must be in a CSV file or a zip containing a single CSV file.
- One column needs to hold the text data.
- The dataset (data) for an image or audio annotation task must be in a zip file containing the images or audios.
- To learn how to annotate your dataset, see Create an annotation task.
- To learn how to download an annotated dataset, see Download an annotated dataset.
- To learn how to import an already annotated dataset to H2O Label Genie to review further or edit, see Import an annotated dataset.
Instructions
To import your dataset (data) to H2O Label Genie, consider the following instructions:
On the H2O Label Genie navigation menu, click Datasets.
Click Import data.
In the Name box, enter a name for the dataset.
(Optional) In the Description box, enter a description for the dataset.
For Data type, choose an option.
- If the data type of the dataset you are importing is text: Select Text.
- If the data type of the dataset you are importing is image: Select Image.
- If the data type of the dataset you are importing is audio: Select Audio.
In the Source list, select the source (data connector) that you want to use (for example, S3).
- Upload
- S3
- H2O Drive
- Click Browse....
Or drag and drop the file (dataset)- In the S3 bucket name box, enter the name of the S3 bucket name.
- In the AWS access key box, enter the AWS access key.
- In the AWS access key box, enter the AWS access key.info
You don't need to enter the AWS access key if the S3 bucket is public.
- In the AWS secret key box, enter the AWS secret key.info
You don't need to enter the AWS secret key if the S3 bucket is public.
- In the File name list, select the file you want to use.
- In the File name list, select the file you want to use.
Click Import.
Dataset connectors
Overview
H2O Label Genie provides 2+ dataset connectors to access external data sources.
Supported connectors
Upload (Standard upload feature)
The following parameter is required:
- File
S3 (Amazon AWS S3)
The following parameters are required:
- S3 bucket name
- AWS access key
- AWS secret key
- File name
H2O Drive (H2O.ai's data storage)
The following parameter is required:
- File name
To learn more about H2O Drive, see H2O Drive | Docs.
Supported image extensions
The following is a list of supported image extensions for image annotation tasks in H2O Label Genie:
- Windows bitmaps
.bmp
(Bitmap Image)
- JPEG files:
.jpeg
(JPEG Image).jpg
(JPEG Image).jpe
(JPEG Image)
- JPEG 2000 files
.jp2
(JPEG 2000 Core Image)
- Portable Network Graphics
.png
(Portable Network Graphic)
- WebP:
.webp
(WebP Image)
- Portable image format:
.pbm
(Portable Bitmap Image).pgm
(Portable Gray Map Image).ppm
(Portable Pixmap Image).pnm
(Portable Any Map Image)
- TIFF files
.tiff
(Tagged Image File Format).tif
(Tagged Image File)
- Radiance HDR
.hdr
(High Dynamic Range Image File)
Supported audio extensions
The following is a list of supported audio extensions for audio annotation tasks in H2O Label Genie:
- Uncompressed
.wav
(WAVE Audio).aiff
(Audio Interchange File Format)
- Lossless compressed
.flac
(Free Lossless Audio Codec File)
- Lossy compressed
.mp3
(MP3 Audio).ogg
(Ogg Vorbis Audio)
- Submit and view feedback for this page
- Send feedback about H2O Label Genie to cloud-feedback@h2o.ai