Dataset format: Text metric learning
- Formats
- Example
The data for a text metric learning experiment can be formatted following format 1 or 2.
- Format 1
- Format 2
A CSV file.
csv_name.csv (1)(2)
A zip file containing a CSV file.
folder_name.zip (1)
│ └───csv_name.csv (2)
You can have multiple CSV files in the zip file that you can use as train, validation, and test dataframes:
- A train CSV file needs to follow the format described above
- A validation CSV file needs to follow the same format as a train CSV file
- A test CSV file needs to follow the same format as a train CSV file, but does not require a label column
- The available dataset connectors require the data for a text metric learning experiment to be in a zip or CSV file. Note
To learn how to upload your zip or CSV file as your dataset in H2O Hydrogen Torch, see Dataset connectors.
- A CSV file containing the following columns:
- A text column containing the input texts
- A label column containing the class names Note
Texts that are similar should have the same class name.
- An optional fold column containing cross-validation fold indexesNote
The fold column can include integers (0, 1, 2, … , N-1 values or 1, 2, 3… , N values) or categorical values.
The ubuntu_text_metric_learning.zip file is a preprocessed dataset in H2O Hydrogen Torch and was formatted to solve a text metric learning problem. The structure of the zip file is as follows:
ubuntu_text_metric_learning.zip
│ └───train.csv
│ └───test.csv
As follows, a random row from the train.csv
file:
text | label | fold |
---|---|---|
what is the easiest way to strip a desktop edition to a server edition ? | 16 | 1 |
To learn how to access one of the preprocessed datasets in H2O Hydrogen Torch, see Demo (preprocessed) datasets.
- Submit and view feedback for this page
- Send feedback about H2O Hydrogen Torch to cloud-feedback@h2o.ai