Predictions download formats: Image instance segmentation
Overview
When you download predictions in H2O Hydrogen Torch, which comes in a zip file, the format and content of the file first depends on the problem type of the predictions, and then it depends on how you generate them. On the point of "how you generate them," there are two scenarios.
-
Scenario 1: Predictions from a completed experiment
Predictions downloaded from a completed experiment on the View experiments card are packaged in a zip file. This zip file contains the following files:
validation_predictions.csv: This is a structured dataframe in CSV format, presenting the final predictions for the provided validation dataframe.validation_raw_predictions.pkl: This is a Pickle file, which is essentially a pickled Python dictionary. It contains raw predictions for the provided validation dataframe.
If the experiment included a test dataframe, H2O Hydrogen Torch also includes two additional files in the same zip file:
test_predictions.csv: This is another structured dataframe in CSV format, displaying the final predictions for the provided test dataframe.test_raw_predictions.pkl: Similar to the validation set, this is a Pickle file with raw predictions for the provided test dataframe.
-
Scenario 2: Predictions generated by scoring on new data
Predictions generated by scoring on new data through the H2O Hydrogen Torch UI (on the Predict data card) are downloaded in a zip file. This zip file includes the following files:
test_predictions.csv: This is a structured dataframe in CSV format, showing the final predictions for the provided test dataframe.test_raw_predictions.pkl: This is a Pickle file, a pickled Python dictionary containing raw predictions for the provided test dataframe.
Formats
- `.pkl` file keys
- `.csv` file columns
The Pickle file, contains the following keys:
- raw_probabilities
- A 4-dimensional NumPy array that contains pixel-wise probabilities. The shape of the array is as follows:
number_of_observationsxnumber_of_classes + 2x{image_height}x{image_width}. Two additional channels (+ 2) are added to thenumber_of_classescorresponding to individual instance borders and borders between instancesNoteYou can define the
{image_height}and{image_width}under the Image settings section when building an image instance segmentation experiment.cautionRaw probabilities are available only if the following setting was set to True when running an image instance segmentation experiment: Save prediction probabilities.
- A 4-dimensional NumPy array that contains pixel-wise probabilities. The shape of the array is as follows:
- instance_predictions
- A list of 3-dimensional NumPy arrays containing instance predictions, where each instance is represented as a separate integer starting from 1 for each class. The length of the list is
number_of_observationsand the shape of each array is as follows:original_image_heightxoriginal_image_widthxnumber_of_classes, whereoriginal_image_heightandoriginal_image_widthare height and width of the corresponding input image
- A list of 3-dimensional NumPy arrays containing instance predictions, where each instance is represented as a separate integer starting from 1 for each class. The length of the list is
- confidences
- A list of dictionaries containing prediction confidences for each instance; the length of the list is N (
number_of_observations). Each element of the list is a dictionary with keys representing the class names and values representing the confidences for each instance ID (starting from 1)
- A list of dictionaries containing prediction confidences for each instance; the length of the list is N (
- class_names
- The
class_namesrefer to a list containing all the class names. The class names follow the order of the class names in the 4-dimensional Numpy raw_probabilities array and the 4-dimensional NumPy instance_predictions array
- The
- [image_column]
- A 1-dimensional NumPy array that contains input image names. The name of the key is
[image_column]whereimage_columnrefers to the name of the image column in the train dataframe
NoteYou can define the
[image_column]under the Dataset settings section when building an image semantic segmentation experiment. - A 1-dimensional NumPy array that contains input image names. The name of the key is
The csv file contains the following columns:
- A column named
{image_column_name}whereimage_column_namerefers to the image column name in the train dataframe
You can define the image_column_name under the Dataset settings section when building an image instance segmentation experiment.
- A column named
{class_name_column}containing the class names for each instance predicted, whereclass_name_columnrefers to the name of the column in the train dataframe that refers to the class names - A column named
instance_rlethat contains Run-length encoded (RLEs) mask for each instance - A column named
confidencecontaining the confidence scores for each instance
To learn how to open the csv and Pickle files, see Open CSV and Pickle files with Python.
Open CSV and Pickle files with Python
Using Python, a csv or Pickle file containing predictions can be open as follows:
import pickle
import pandas as pd
df = pd.read_csv('text_classification/validation_predictions.csv')
with open('text_classification/validation_raw_predictions.pkl', 'rb') as f:
out = pickle.load(f)
print(out.keys())
dict_keys(['predictions', 'comment_text', 'labels'])
print(df.head(1))
| id | comment_text | label_toxic | label_severe_toxic | label_obscene | label_threat | label_insult | label_identity_hate | fold | pred_label_toxic | pred_label_severe_toxic | pred_label_obscene | pred_label_threat | pred_label_insult | pred_label_identity_hate |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 000103f0d9cfb60f | D'aww! He matches this background colour I'm s... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.00041 | 0.000168 | 0.000328 | 0.000142 | 0.000247 | 0.000155 |
- Submit and view feedback for this page
- Send feedback about H2O Hydrogen Torch to cloud-feedback@h2o.ai