Prediction settings: Text sequence to sequence
To score (predict) new data using a built model through the H2O Hydrogen Torch UI, H2O Hydrogen Torch requires the specification of certain settings refer as prediction settings (which are comprised of a certain dataset, prediction and environment settings similar to those when creating an experiment). Below observe the prediction settings for a text sequence to sequence model.
General settings
Experiment
Defines the model (experiment) H2O Hydrogen Torch uses to score new data.
Prediction name
It defines the name of the prediction.
Dataset settings
Dataset
Specifies the dataset to use for scoring.
Test dataframe
Defines the file(s) containing the test dataframe that H2O Hydrogen Torch will use for scoring.
- Image regression | Image classification | Image metric learning | Text regression | Text classification | Text sequence to sequence | Text span prediction | Text token classification | Text metric learning | Audio regression | Audio classification
- Defines a
.csv
or.pq
file containing the test dataframe that H2O Hydrogen Torch will use for scoring.noteThe test dataframe should have the same format as the train dataframe but does not require label columns.
- Defines a
- Image object detection | Image semantic segmentation | Image instance segmentation
- Defines a
.pq
file containing the test dataframe that H2O Hydrogen Torch will use for scoring.
- Defines a
Text column
Defines the column name with the input text that H2O Hydrogen Torch will use during scoring.
Prediction settings
Metric
Specifies the evaluation metric to use to evaluate the model's accuracy.
Usually, the evaluation metric should reflect the quantitative way of assessing the model's value for the corresponding use case.
Max length
Defines the max length value H2O Hydrogen Torch will use for the generated text.
- Similar to the Max length setting in the Tokenizer Settings section, this setting specifies the maximum number of tokens to predict for a given prediction sample.
- This setting impacts predictions and the evaluation metrics and should depend on the dataset and average output sequence length that is expected to be predicted.
Do sample
Determines whether to sample from the next token distribution instead of choosing the token with the highest probability. If turned On, the next token in a predicted sequence is sampled based on the probabilities. If turned Off, the highest probability is always chosen.
Num beams
Defines the number of beams to use for beam search. Num Beams default value is 1 (a single beam); no beam search.
The selection of various beams increases prediction runtime while potentially improving accuracy.
Temperature
Defines the temperature to use for sampling from the next token distribution during validation and inference. In other words, the defined temperature controls the randomness of predictions by scaling the logits before applying softmax. A higher temperature makes the distribution more random.
- Modify the temperature value if you have the Do Sample setting enabled (On).
- To learn more about this setting, refer to the following article: How to generate text: using different decoding methods for language generation with Transformers.
Environment settings
GPUs
Specifies the list of GPUs H2O Hydrogen Torch can use for scoring. GPUs are listed by name, referring to their system ID (starting from 1).
- Submit and view feedback for this page
- Send feedback about H2O Hydrogen Torch to cloud-feedback@h2o.ai