Skip to main content
Version: v1.2.0

Import dataset settings: Text span prediction

Dataset name

Name of the dataset.

Problem type

Defines the problem type of the experiment, which also defines the settings H2O Hydrogen Torch displays for the experiment.

Note
  • The selected problem type and experience level determine the settings H2O Hydrogen Torch displays for the experiment
  • The From experiment option allows you to use the settings from a previously run experiment

Train dataframe

Defines a .csv or .pq file containing a dataframe with training records that H2O Hydrogen Torch will use to train the model.

note
  • The records will be combined into mini-batches when training the model.
  • If a validation dataframe is provided, a fold column is not needed in the train dataframe.

Validation dataframe

Defines a .csv or .pq file containing a dataframe with validation records that H2O Hydrogen Torch will use to evaluate the model during training.

Note
  • To set a Validation dataframe requires the Validation strategy to be set to Custom holdout validation. In this case, H2O Hydrogen Torch will fully respect the choice of a separate validation dataframe and will not perform any internal cross-validation. In other words, the model is trained on the full provided train dataframe, and model performance is evaluated on the provided validation dataframe.
  • The validation dataframe should have the same format as the train dataframe but does not require a fold column.

Test dataframe

Defines a .csv or .pq file containing a dataframe with test records that H2O Hydrogen Torch will use to test the model.

note

The test dataframe should have the same format as the train dataframe but does not require a label column.

Question column

Defines the dataset column containing the question text H2O Hydrogen Torch will use during model training.

Context column

Defines the dataset column containing text that answers the question in the question column; H2O Hydrogen Torch will use the context column during model training.

Answer column

Defines the dataset column containing the answer text that H2O Hydrogen Torch will use during model training.

Answer start column

Defines the dataset column, which describes the start of the answer text in the context column. If not set, H2O Hydrogen Torch will choose the first occurrence of the answer text found in the context text as the start of the answer text in the context column.


Feedback