Version: v1.4.0

Dataset format: Text regression

The data for a text regression experiment can be formatted following format 1 or 2.

A CSV file.

csv_name.csv (1)(2)

A zip file containing a CSV file.

folder_name.zip (1)
│   └───csv_name.csv (2)

Note

You can have multiple CSV files in the zip file that you can use as train, validation, and test dataframes:

A train CSV file needs to follow the format described above
A validation CSV file needs to follow the same format as a train CSV file
A test CSV file needs to follow the same format as a train CSV file, but does not require label column(s)

The available dataset connectors require the data for a text regression experiment to be in a zip or CSV file.

Note

To learn how to upload your zip or CSV file as your dataset in H2O Hydrogen Torch, see Dataset connectors.

The wellformed_query_text_regression.csv file is a preprocessed dataset in H2O Hydrogen Torch and was formatted to solve a text regression problem.

As follows, two random rows from the CSV file:

rating	text
0.2	The European Union includes how many ?
1.0	What is released when an ion is formed ?

Note

The rating column refers to the label column.
To learn how to access one of the preprocessed datasets in H2O Hydrogen Torch, see Demo (preprocessed) datasets.

Feedback