Version: v1.4.0

Predictions download formats: Image semantic segmentation

Overview

When you download predictions in H2O Hydrogen Torch, which comes in a zip file, the format and content of the file first depends on the problem type of the predictions, and then it depends on how you generate them. On the point of "how you generate them," there are two scenarios.

Scenario 1: Predictions from a completed experiment

Predictions downloaded from a completed experiment on the View experiments card are packaged in a zip file. This zip file contains the following files:
1. validation_predictions.csv: This is a structured dataframe in CSV format, presenting the final predictions for the provided validation dataframe.
2. validation_raw_predictions.pkl: This is a Pickle file, which is essentially a pickled Python dictionary. It contains raw predictions for the provided validation dataframe.
If the experiment included a test dataframe, H2O Hydrogen Torch also includes two additional files in the same zip file:
1. test_predictions.csv: This is another structured dataframe in CSV format, displaying the final predictions for the provided test dataframe.
2. test_raw_predictions.pkl: Similar to the validation set, this is a Pickle file with raw predictions for the provided test dataframe.
Scenario 2: Predictions generated by scoring on new data

Predictions generated by scoring on new data through the H2O Hydrogen Torch UI (on the Predict data card) are downloaded in a zip file. This zip file includes the following files:
1. test_predictions.csv: This is a structured dataframe in CSV format, showing the final predictions for the provided test dataframe.
2. test_raw_predictions.pkl: This is a Pickle file, a pickled Python dictionary containing raw predictions for the provided test dataframe.

Formats

`.pkl` file keys
`.csv` file columns

The Pickle file, contains the following keys:

masks
- A 4-dimensional NumPy array that contains pixel-wise probabilities. The shape of the array is as follows: number_of_observations x number_of_classes x {image_height} x {image_width}.
Note
You can define the {image_height} and {image_width} under the Image settings section when building an image semantic segmentation experiment.

caution
Masks are available only if the following setting was set to True when running an image semantic segmentation experiment: Save prediction probability.
original_image_shapes
- A 2-dimensional NumPy array that contains shapes of the original input images. The shape of the array is as follows: number_of_observations x 2, where the 2nd dimension contains original_image_height and original_image_width of the corresponding input image.
rle_predictions
- A 2-dimensional NumPy array that contains RLE-encoded predictions for each class. The shape of the array is as follows: number_of_observations x number_of_classes. You can use RLE predictions with corresponding original_image_shapes to decode RLE-encoded strings to binary masks.
class_names
- The class_names refers to a list containing all the class names. The class names follow the order of the class names in the 4-dimensional NumPy masks array.
[image_column]
- A 1-dimensional NumPy array that contains input image names. The name of the key is [image_column] where image_column refers to the name of the image column in the train dataframe.
Note
You can define the [image_column] under the Dataset settings section when building an image semantic segmentation experiment.

The csv file contains the following columns:

All the N columns in the train dataframe
Note
The csv file repeats X times each original row in the train dataframe while having each row contain a different run-length-encoded mask prediction for a given class, where X refers to the {number_of_classes}.

In the case that the train dataframe contains a {class_name_column} and {rle_mask_column}:

A column named {class_name_column} containing input class names, where class_name_column refers to the name of the column in the train dataframe that refers to the class names
A column named {rle_mask_column} containing all the true Run-length encodings (RLEs) in the train dataframe

Note

You can define the {class_name_column} and {rle_mask_column} under the Dataset settings section when building an image semantic segmentation experiment.

In the case that the test dataframe does not contain a {class_name_column} or {rle_mask_column} or both:

The first column in the csv file has the name class_id, and no column with true Run-length encodings (RLEs)
A column with a prefix pred_ follow by a suffix {rle_mask_column} that contains the predicted Run-length encodings (RLEs) of all the predictions, where rle_mask_column refers to the name of the Run-length encodings mask column in the train dataframe

Note

If there's not a {rle_mask_column} in the train dataframe, this column is name pred_mask.
If no mask is predicted, then the column value is an empty string.
You can define the {rle_mask_column} under the Dataset settings section when building an image semantic segmentation experiment.

Note

The i-th sample of each output's dictionary item matches the i-th row of the dataframe.
To learn how to open the csv and Pickle files, see Open csv and Pickle files with Python.

Open CSV and Pickle files with Python

Using Python, a csv or Pickle file containing predictions can be open as follows:

import pickle
import pandas as pd

df = pd.read_csv('text_classification/validation_predictions.csv')

with open('text_classification/validation_raw_predictions.pkl', 'rb') as f:
    out = pickle.load(f)

print(out.keys())

dict_keys(['predictions', 'comment_text', 'labels'])

print(df.head(1))

id	comment_text	label_toxic	label_severe_toxic	label_obscene	label_threat	label_insult	label_identity_hate	fold	pred_label_toxic	pred_label_severe_toxic	pred_label_obscene	pred_label_threat	pred_label_insult	pred_label_identity_hate
000103f0d9cfb60f	D'aww! He matches this background colour I'm s...	0	0	0	0	0	0	0	0.00041	0.000168	0.000328	0.000142	0.000247	0.000155

Feedback

Submit and view feedback for this page
Send feedback about H2O Hydrogen Torch to cloud-feedback@h2o.ai

Overview​

Formats​

Open CSV and Pickle files with Python​

Overview

Formats

Open CSV and Pickle files with Python