Dataset format: Text regression
- Formats
- Example
The data for a text regression experiment can be formatted following format 1 or 2.
- Format 1
- Format 2
A CSV file.
csv_name.csv (1)(2)
A zip file containing a CSV file.
folder_name.zip (1)
│ └───csv_name.csv (2)
You can have multiple CSV files in the zip file that you can use as train, validation, and test dataframes:
- A train CSV file needs to follow the format described above
- A validation CSV file needs to follow the same format as a train CSV file
- A test CSV file needs to follow the same format as a train CSV file, but does not require label column(s)
- The available dataset connectors require the data for a text regression experiment to be in a zip or CSV file. Note
To learn how to upload your zip or CSV file as your dataset in H2O Hydrogen Torch, see Dataset connectors.
- A CSV file containing the following columns:
- A text column containing the texts for the experiment
- One or more label columns containing the numerical labels (targets)Note
H2O Hydrogen Torch can train models that predict multiple labels simultaneously. You can provide multiple columns with multiple unique labels and choose which labels to predict when starting a new experiment.
- An optional fold column containing cross-validation fold indexes Note
The fold column can include integers (0, 1, 2, … , N-1 values or 1, 2, 3… , N values) or categorical values.
The wellformed_query_text_regression.csv file is a preprocessed dataset in H2O Hydrogen Torch and was formatted to solve a text regression problem.
As follows, two random rows from the CSV file:
rating | text |
---|---|
0.2 | The European Union includes how many ? |
1.0 | What is released when an ion is formed ? |
- The rating column refers to the label column.
- To learn how to access one of the preprocessed datasets in H2O Hydrogen Torch, see Demo (preprocessed) datasets.
- Submit and view feedback for this page
- Send feedback about H2O Hydrogen Torch to cloud-feedback@h2o.ai