Version: v1.5.0

Prediction settings: Text sequence-to-sequence

Overview

To score (predict) new data through the H2O Hydrogen Torch UI (with a built model), you need to specify certain settings refer as prediction settings (which are comprised of certain dataset, prediction, and environment settings similar to those utilized when creating an experiment). Below observe the prediction settings for a text sequence-to-sequence model.

General settings

Experiment

This setting defines the model (experiment) H2O Hydrogen Torch utilizes to score new data.

Prediction name

This setting defines the name of the prediction.

Dataset settings

Dataset

This setting specifies the dataset to score.

Test dataframe

This setting defines the file containing the test dataset that H2O Hydrogen Torch scores.

note

Image regression | 3D image regression | Image classification | 3D image classification | Image metric learning | Text regression | Text classification | Text sequence to sequence | Text span prediction | Text token classification | Text metric learning | Audio regression | Audio classification | Graph node classification | Graph node regression
- Defines a CSV or Parquet file containing the test dataset that H2O Hydrogen Torch utilizes for scoring.
note
The test dataset should have the same format as the train dataset but does not require label columns.

Image object detection | Image semantic segmentation | 3D image semantic segmentation | Image instance segmentation
- Defines a Parquet file containing the test dataset that H2O Hydrogen Torch utilizes for scoring.
  :::

Text column

Defines the column name with the input text that H2O Hydrogen Torch uses during scoring.

Prediction settings

Metric

This setting defines the evaluation metric in which H2O Hydrogen Torch evaluates the model's accuracy on generated predictions.

Batch Size Inference

This setting defines the batch size of examples to utilize for inference.

note

Selecting 0 will set the Batch size inference to the same value used for the Batch size setting (utilized during training).

Max length inference

Defines the maximum length value H2O Hydrogen Torch uses for the generated text.

note

Similar to the Max length setting in the Tokenizer Settings section (when defining the settings of the experiment), this setting specifies the maximum number of tokens to predict for a given prediction sample.
This setting impacts predictions and the evaluation metrics and should depend on the dataset and average output sequence length that is expected to be predicted.

Do sample

Determines whether to sample from the next token distribution instead of choosing the token with the highest probability. If turned On, the next token in a predicted sequence is sampled based on the probabilities. If turned Off, the highest probability is always chosen.

Num beams

Defines the number of beams to use for beam search. Num Beams default value is 1 (a single beam); no beam search.

note

The selection of various beams increases prediction runtime while potentially improving accuracy.

Temperature

Defines the temperature to use for sampling from the next token distribution during validation and inference. In other words, the defined temperature controls the randomness of predictions by scaling the logits before applying softmax. A higher temperature makes the distribution more random.

note

Modify the temperature value if you have the Do Sample setting enabled (On).
To learn more about this setting, refer to the following article: How to generate text: using different decoding methods for language generation with Transformers.

Environment settings

GPUs

This setting specifies the list of GPUs H2O Hydrogen Torch can use for scoring. GPUs are listed by name, referring to their system ID (starting from 1). If no GPUs are selected, H2O Hydrogen Torch utilizes CPUs for model scoring.

Feedback

Submit and view feedback for this page
Send feedback about H2O Hydrogen Torch to cloud-feedback@h2o.ai

Overview​

General settings​

Experiment​

Prediction name​

Dataset settings​

Dataset​

Test dataframe​

Text column​

Prediction settings​

Metric​

Batch Size Inference​

Max length inference​

Do sample​

Num beams​

Temperature​

Environment settings​

GPUs​

Overview

General settings

Experiment

Prediction name

Dataset settings

Dataset

Test dataframe

Text column

Prediction settings

Metric

Batch Size Inference

Max length inference

Do sample

Num beams

Temperature

Environment settings

GPUs