Version: v1.4.0

Dataset format: Text span prediction

Formats
Example

The data for a text span prediction experiment can be formatted following format 1 or 2.

Format 1
Format 2

A CSV file.

csv_name.csv (1)(2)

A zip file containing a CSV file.

folder_name.zip (1)
│   └───csv_name.csv (2)

Note

You can have multiple CSV files in the zip file that you can use as train, validation, and test dataframes:

A train CSV file needs to follow the format described above
A validation CSV file needs to follow the same format as a train CSV file
A test CSV file needs to follow the same format as a train CSV file, but does not require an answer column

The available dataset connectors require the data for a text span prediction experiment to be in a zip or CSV file.
Note
To learn how to upload your zip or CSV file as your dataset in H2O Hydrogen Torch, see Dataset connectors.
A CSV file containing the following columns:
- A context column containing/representing the input texts
- A question column containing/representing the questions (that the input context text can answer)
- An answer column containing/representing the substrings from the context column that answers the questions (question column)
- An optional answer-start column containing/representing the start of the substring answers in the context column
  Note
  - The start of the substring answers needs to be specified by integers representing the index where the answer starts in the context.
  - If you do not provide an answer-start column, H2O Hydrogen Torch selects the first occurrence of the answer in the context.
- An optional fold column containing cross-validation fold indexes
  Note
  The fold column can include integers (0, 1, 2, … , N-1 values or 1, 2, 3… , N values) or categorical values.

The squad_text_span_prediction.zip file is a preprocessed dataset in H2O Hydrogen Torch and was formatted to solve a text span prediction problem. The structure of the zip file is as follows:

squad_text_span_prediction.zip
│   └───squad_v1.csv

As follows, a random row from the CSV file:

question	context	answer
To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?	Architecturally, the school has a Catholic character. Atop the Main Building's gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.	Saint Bernadette Soubirous

Note

To learn how to access one of the preprocessed datasets in H2O Hydrogen Torch, see Demo (preprocessed) datasets.

Feedback

Submit and view feedback for this page
Send feedback about H2O Hydrogen Torch to cloud-feedback@h2o.ai