Import dataset settings: Image object detection
Dataset name
Name of the dataset.
Problem type
Defines the problem type of the experiment, which also defines the settings H2O Hydrogen Torch displays for the experiment.
- The selected problem type and experience level determine the settings H2O Hydrogen Torch displays for the experiment
- The From experiment option allows you to use the settings from a previously run experiment
Data format
Specifies the data format of the training, validation, and test dataframes.
- Image object detection
- Supported formats for an image object detection experiment are as follows:
- Hydrogen Torch format
- Individual boxes format
- COCO format
- Pascal VOC format
- Supported formats for an image object detection experiment are as follows:
- Image semantic segmentation | Image instance segmentation
- Supported formats for an image semantic segmentation and image instance segmentation experiment are as follows:
- Hydrogen Torch format
- COCO format
- Supported formats for an image semantic segmentation and image instance segmentation experiment are as follows:
Train dataframe
Defines a .csv
or .pq
file containing a dataframe with training records that H2O Hydrogen Torch will use to train the model.
- The records will be combined into mini-batches when training the model.
- If a validation dataframe is provided, a fold column is not needed in the train dataframe.
Train json
A .json
file in COCO format with test records that H2O Hydrogen Torch uses to test the model.
This setting is available when COCO is selected in the Data format setting.
Train labels folder
Specifies a directory containing a train labels folder that needs to contain .xml
files with labels in Pascal VOC format with training records that H2O Hydrogen Torch will use to train the model. The records will be combined into mini-batches when training the model.
This setting is available when Pascal VOC is selected in the Data format setting.
Data folder
Defines the folder location of the assets (e.g., images or audio clips) the model utilizes for training. H2O Hydrogen Torch loads assets from this folder during training.
Validation dataframe
Defines a .csv
or .pq
file containing a dataframe with validation records that H2O Hydrogen Torch will use to evaluate the model during training.
- To set a Validation dataframe requires the Validation strategy to be set to Custom holdout validation. In this case, H2O Hydrogen Torch will fully respect the choice of a separate validation dataframe and will not perform any internal cross-validation. In other words, the model is trained on the full provided train dataframe, and model performance is evaluated on the provided validation dataframe.
- The validation dataframe should have the same format as the train dataframe but does not require a fold column.
Validation json
A .json
file in COCO format with validation records that H2O Hydrogen Torch will use to validate the model during training.
This setting is available when COCO is selected in the Data format setting.
Validation labels folder
Specifies a directory containing a validation labels folder that needs to contain .xml
files with labels in Pascal VOC format with validation records that H2O Hydrogen Torch will use to validate the model during training.
This setting is available when Pascal VOC is selected in the Data format setting.
Test dataframe
Defines a .csv
or .pq
file containing a dataframe with test records that H2O Hydrogen Torch will use to test the model.
The test dataframe should have the same format as the train dataframe but does not require a label column.
Test json
A .json
file in COCO format with test records that H2O Hydrogen Torch uses to test the model.
This setting is available when COCO is selected in the Data format setting.
Test labels folder
Specifies a directory containing one or more validation labels folder that needs to contain .xml
files with labels in Pascal VOC format with test records that H2O Hydrogen Torch will use to test the model.
This setting is available when Pascal VOC is selected in the Data format setting.
Data folder test
Defines the folder location of the assets (e.g., images or texts) H2O Hydrogen Torch will use to test the model. H2O Hydrogen Torch will load the assets from this folder when testing the model. This setting is only available if a test dataframe is selected.
The Data Folder Test setting appears when you specify a test dataframe in the Test Dataframe setting.
Unlabeled dataframe
Defines a separate .csv
or .pq
file containing a dataframe with unlabeled records that H2O Hydrogen Torch uses to generate pseudo labels. H2O Hydrogen Torch first trains the model with the provided labeled data (Train dataframe). Right after, the model predicts pseudo labels for the data in the provided unlabeled dataframe before doing another training run that combines the original labels and pseudo labels.
- Image regression | Image classification | Image object detection
- The unlabeled dataframe just needs to contain a single image column
- Text regression | Text classification
- The unlabeled dataframe just needs to contain a single text column
- Audio regression | Audio classification | Speech recognition
- The unlabeled dataframe just needs to contain a single audio column
- Image regression | Image classification | Image object detection | Audio regression | Audio classification | Speech recognition
- Assets (e.g., images or audio) need to be located in the Data folder (setting)
- The training time can significantly increase depending on the size of the unlabeled data
As labeling can be expensive, having additional unlabeled data is quite common. You providing this unlabeled data in H2O Hydrogen Torch trains the model in a semi-supervised manner, potentially improving the model quality in contrast to only training on labeled data.
Class name column
Defines the dataset column containing a list of class names that H2O Hydrogen Torch will use for each instance mask.
X min column
Defines the dataset column containing a list of minimum X positions H2O Hydrogen Torch will use for each bounding box.
Y min column
Defines the dataset column containing a list of minimum Y positions H2O Hydrogen Torch will use for each bounding box.
X max column
Defines the dataset column containing a list of maximum X positions H2O Hydrogen Torch will use for each bounding box.
Y max column
Defines the dataset column containing a list of maximum Y positions H2O Hydrogen Torch will use for each bounding box.
Image column
Defines the dataframe column storing the names of images that H2O Hydrogen Torch will load from the data folder and data folder test when training and testing the model.
- Submit and view feedback for this page
- Send feedback about H2O Hydrogen Torch to cloud-feedback@h2o.ai