Skip to main content
Version: v1.4.0

Dataset format: Text span prediction

The data for a text span prediction experiment can be formatted following format 1 or 2.

A CSV file.

csv_name.csv (1)(2)
  1. The available dataset connectors require the data for a text span prediction experiment to be in a zip or CSV file.
    Note

    To learn how to upload your zip or CSV file as your dataset in H2O Hydrogen Torch, see Dataset connectors.

  2. A CSV file containing the following columns:
    • A context column containing/representing the input texts
    • A question column containing/representing the questions (that the input context text can answer)
    • An answer column containing/representing the substrings from the context column that answers the questions (question column)
    • An optional answer-start column containing/representing the start of the substring answers in the context column
      Note
      • The start of the substring answers needs to be specified by integers representing the index where the answer starts in the context.
      • If you do not provide an answer-start column, H2O Hydrogen Torch selects the first occurrence of the answer in the context.
    • An optional fold column containing cross-validation fold indexes
      Note

      The fold column can include integers (0, 1, 2, … , N-1 values or 1, 2, 3… , N values) or categorical values.


Feedback