Experiment settings: Text classification
The settings for a text classification experiment are listed and described below.
General settings
Dataset
It defines the dataset for the experiment.
Problem type
Defines the problem type of the experiment, which also defines the settings H2O Hydrogen Torch displays for the experiment.
- The selected problem type and experience level determine the settings H2O Hydrogen Torch displays for the experiment
- The From experiment option allows you to use the settings from a previously run experiment
Import config from YAML
Defines the .yml
file that defines the experiment settings.
- H2O Hydrogen Torch supports a
.yml
file import and export functionality. You can download the config settings of finished experiments, make changes, and re-upload them when starting a new experiment in any instance of H2O Hydrogen Torch.- To learn how to download the
.yml
file (configuration file) of a completed experiment, see Download an experiment's logs/config file.
- To learn how to download the
Experiment name
It defines the name of the experiment.
Dataset settings
Train dataframe
Defines a .csv
or .pq
file containing a dataframe with training records that H2O Hydrogen Torch will use to train the model.
- The records will be combined into mini-batches when training the model.
- If a validation dataframe is provided, a fold column is not needed in the train dataframe.
Validation strategy
Specifies the validation strategy H2O Hydrogen Torch will use for the experiment.
To properly assess the performance of your trained models, it is common practice to evaluate it on separate holdout data that the model has not seen during training. H2O Hydrogen Torch allows you to specify different strategies for this task fitting your needs.
Options
K-fold cross validation
Splits the data using the provided optional fold column in the train data or performs an automatic 5-fold cross-validation.
Grouped k-fold cross validation
Allows to specify a group column based on which the data is split into folds.
Custom holdout validation:
Specifies a separate holdout dataframe.
Automatic holdout validation
Allows to specify a holdout validation sample size that is automatically generated.
Validation dataframe
Defines a .csv
or .pq
file containing a dataframe with validation records that H2O Hydrogen Torch will use to evaluate the model during training.
- To set a Validation dataframe requires the Validation strategy to be set to Custom holdout validation. In this case, H2O Hydrogen Torch will fully respect the choice of a separate validation dataframe and will not perform any internal cross-validation. In other words, the model is trained on the full provided train dataframe, and model performance is evaluated on the provided validation dataframe.
- The validation dataframe should have the same format as the train dataframe but does not require a fold column.
Folds
Defines the validation folds in case of cross-validation; a separate model is trained for each value selected. Each model will use the corresponding part of the data as a holdout sample to assess performance while the model is fitted to the rest of the records from the training dataframe. As a result, folds estimate how the model will perform in general when used to make predictions on data not used during model training.
- If a column with the name fold is present in the train dataframe, H2O Hydrogen Torch will use the fold column values for folding; otherwise, a simple 5-fold (K-fold) will be applied.
- H2O Hydrogen Torch allows running experiments on single folds for faster experimenting and multiple folds to gain more trust in the model's generalization and performance capabilities.
- The Folds setting will only be available if Custom holdout validation is not selected as the Validation strategy.
Test dataframe
Defines a .csv
or .pq
file containing a dataframe with test records that H2O Hydrogen Torch will use to test the model.
The test dataframe should have the same format as the train dataframe but does not require a label column.
Unlabeled dataframe
Defines a separate .csv
or .pq
file containing a dataframe with unlabeled records that H2O Hydrogen Torch uses to generate pseudo labels. H2O Hydrogen Torch first trains the model with the provided labeled data (Train dataframe). Right after, the model predicts pseudo labels for the data in the provided unlabeled dataframe before doing another training run that combines the original labels and pseudo labels.
- Image regression | Image classification | Image object detection
- The unlabeled dataframe just needs to contain a single image column
- Text regression | Text classification
- The unlabeled dataframe just needs to contain a single text column
- Audio regression | Audio classification | Speech recognition
- The unlabeled dataframe just needs to contain a single audio column
- Image regression | Image classification | Image object detection | Audio regression | Audio classification | Speech recognition
- Assets (e.g., images or audio) need to be located in the Data folder (setting)
- The training time can significantly increase depending on the size of the unlabeled data
As labeling can be expensive, having additional unlabeled data is quite common. You providing this unlabeled data in H2O Hydrogen Torch trains the model in a semi-supervised manner, potentially improving the model quality in contrast to only training on labeled data.
Label columns
Defines the name(s) of the dataframe column(s) that refer to the target value(s) H2O Hydrogen Torch will aim to predict.
- It can be more than one label column, and therefore, the target value to predict can be single or multi-column.
- Image classification supports multi-class and multilabel classification.
Text column
Defines the dataset column(s) containing the input text H2O Hydrogen Torch will use during model training. H2O Hydrogen Torch will concatenate multiple text columns with a specific separator token.
Data sample
Modifies the percentage of the data to use for the experiment. The default percentage is 100% (1).
Changing the default value can significantly increase the training speed. Still, it might lead to a substantially poor accuracy value. Using 100% of the data for final models is highly recommended.
Tokenizer settings
Lowercase
Grid search hyperparameter
Determines whether to transform to lower case the text that H2O Hydrogen Torch will observe during the experiment. This setting is turned Off by default.
When turned On, the observed text will always be lowercased before training and prediction. Tuning this setting can potentially lead to a higher accuracy value for certain types of datasets.
Max length
Grid search hyperparameter
Defines the maximum length of the input sequence H2O Hydrogen Torch will use during model training. In other words, this setting specifies the maximum number of tokens an input text is transformed for model training.
A higher token count will lead to higher memory usage that will slow down training while increasing the probability of obtaining a higher accuracy value.
Padding quantile
Defines the padding quantile H2O Hydrogen Torch uses to select the maximum token length per batch. H2O Hydrogen Torch will perform padding of shorter sequences up to the specified padding quantile instead of the selected Max length. H2O Hydrogen Torch will truncate longer sequences.
- Lowering the quantile can significantly increase training runtime and reduce memory usage in unevenly distributed sequence lengths but can hurt performance
- The setting depends on the batch size and should be adjusted accordingly
- No padding is done in inference, and the selected Max Length will be guaranteed
Architecture settings
Pretrained
Defines whether the neural network should start with pre-trained weights. When this setting is On, the training of the neural network will start with a pre-trained model on a generic task. When turn Off, the initial weights of the neural network to train will be random.
Backbone
Grid search hyperparameter
Defines the backbone neural network architecture to train the model.
- Image regression | Image classification | Image metric learning | Audio regression | Audio classification
- H2O Hydrogen Torch accepts backbone neural network architectures from the timm library (select or enter the architecture name).
- Image object detection
- H2O Hydrogen Torch provides several backbone state-of-the-art neural network architectures for model training. When you select Faster RCnn or Fcos as the model type for the experiment, you can input any architecture name from the timm library. When you select Efficientdet as the model type for the experiment, you can input any architecture name from the efficientdet-pytorch library.
- Image semantic segmentation | Image instance segmentation
- H2O Hydrogen Torch accepts backbone neural network architectures from the segmentation-models-pytorch library (select or enter the architecture name).
- Text regression | Text classification | Text token classification | Text span prediction | Text sequence to sequence | Text metric learning
- H2O Hydrogen Torch accepts backbone neural network architectures from the Hugging Face library (select or enter the architecture name).
- Speech recognition
- HuggingFace Wav2Vec2 CTC models are supported.
- Usually, it is good to use simpler architectures for quicker experiments and larger models when aiming for the highest accuracy.
- Speech recognition
- Leverage backbones that were pretrained as closely to your use-case if possible (e.g., noisy audio, casual speech etc).
Gradient checkpointing
Determines whether H2O Hydrogen Torch will activate gradient-checkpointing (GC) when training the model, starting GC reduces the video random access memory (VRAM) footprint at the cost of a longer runtime (an additional forward pass). Turning On GC will enable it during the training process.
Gradient checkpointing is an experimental setting that is not compatible with all backbones. If a backbone is not supported, the experiment will fail, and H2O Hydrogen Torch will inform through the logs that the selected backbone is not compatible with gradient checkpointing. To learn about the backbone setting, see Backbone.
Activating GC comes at the cost of a longer training time; for that reason, try training without GC first and only activate when experiencing GPU out-of-memory (OOM) errors.
Intermediate dropout
Defines the custom dropout rate H2O Hydrogen Torch will use for intermediate layers in the transformer model.
Dropout
Grid search hyperparameter
- Audio classification | Audio regression | Image classification | Image metric learning | Image regression | Text classification | Text metric learning | Text regression | Text token classification
- Defines the dropout rate before the final fully connected layer that H2O Hydrogen Torch will apply during model training. This setting defines the dropout rate between the backbone and neck of the model H2O Hydrogen Torch will apply during model training. The dropout rate will help the model generalize better by randomly dropping a share of the neural network connections.
Pool
Grid search hyperparameter
Defines the global pooling method H2O Hydrogen Torch will use in the model architecture before the final fully connected layer. Instead of adding a fully connected layer on top of the feature maps, global pooling is applied to each feature map beforehand.
Certain backbones (e.g., VIT) do not require pooling. Accordingly, H2O Hydrogen Torch will not display this setting.
Options
- Average
- H2O Hydrogen Torch applies global average pooling.
- CatAverageMax
- H2O Hydrogen Torch concatenates global average and max poolings.
- GeM
- H2O Hydrogen Torch applies a Generalized Mean Pooling (GeM) introduced in the following paper: Fine-tuning CNN Image Retrieval with No Human Annotation.
- Max
- H2O Hydrogen Torch applies a global max pooling.
- MeanAverageMax
- H2O Hydrogen Torch calculates the mean between global average and max poolings.
- [CLS] token
- H2O Hydrogen Torch uses the output of the first [CLS] token.
Training settings
Loss function
Grid search hyperparameter
Defines the loss function H2O Hydrogen Torch will use during model training. The loss function is a differentiable function measuring the prediction error. The model will use gradients of the loss function to update the model weights during training.
Options
- CrossEntropy
- H2O Hydrogen Torch utilizes multi-class cross entropy loss as a loss function.
- BCE
- H2O Hydrogen Torch uses binary cross entropy loss.
- MAE
- H2O Hydrogen Torch utilizes the mean absolute error (L1 norm) as the loss function.
- MSE
- H2O Hydrogen Torch utilizes the mean squared error (squared L2 norm) as the loss function.
- RMSE
- H2O Hydrogen Torch utilizes the mean squared error (L2 norm) as a loss function.
- BCEDice
- H2O Hydrogen Torch uses binary cross entropy loss and Dice loss weights 2 and 1, respectively.
- BCELovasz
- H2O Hydrogen Torch uses binary cross entropy loss and Lovasz loss with equal weights.
- Dice
- H2O Hydrogen Torch uses Dice loss.
- Focal
- H2O Hydrogen Torch uses the Focal loss introduced in the following paper: Focal Loss for Dense Object Detection
- FocalDice
- H2O Hydrogen Torch uses Focal loss and Dice loss with weights 2 and 1, respectively.
- Jaccard
- H2O Hydrogen Torch uses Jaccard loss.
- ArcFace
- H2O Hydrogen Torch utilizes an Additive Angular Margin Loss for Deep Face Recognition (ArcFace).
- Speech recognition
- CTC Loss
- H2O Hydrogen Torch utilizes Conectionist Temporal Classification loss as a loss function.
- CTC Loss
Optimizer
Grid search hyperparameter
Defines the algorithm or method (optimizer) to use for model training. The selected algorithm or method defines how the model should change the attributes of the neural network, such as weights and learning rate. Optimizers solve optimization problems and make more accurate updates to attributes to reduce learning losses.
Options
Adadelta
To learn about Adadelta, see ADADELTA: An Adaptive Learning Rate Method.
Adam
To learn about Adam, see Adam: A Method for Stochastic Optimization.
AdamW
To learn about AdamW, see Decoupled Weight Decay Regularization.
RMSprop
To learn about RMSprop, see Neural Networks for Machine Learning.
SGD
H2O Hydrogen Torch uses a stochastic gradient descent optimizer.
Learning rate
Grid search hyperparameter
Defines the learning rate H2O Hydrogen Torch will use when training the model, specifically when updating the neural network's weights. The learning rate is the speed at which the model updates its weights after processing each mini-batch of data.
- Learning rate is an important setting to tune as it balances under- and overfitting.
- The number of epochs highly impacts the optimal value of the learning rate.
Differential learning rate layers
Defines the learning rate to apply to certain layers of a model. H2O Hydrogen Torch applies the regular learning rate to layers without a specified learning rate.
Options
- Backbone
- H2O Hydrogen Torch applies a different learning rate to a body of the neural network architecture.
- Head
- H2O Hydrogen Torch applies a different learning rate to a head of the neural network architecture.
- Neck
- H2O Hydrogen Torch applies a different learning rate to a neck of the neural network architecture.
- Loss
- H2O Hydrogen Torch applies a different learning rate to an ArcFace block of the neural network architecture.
- Encoder
- H2O Hydrogen Torch applies a different learning rate to the encoder of the neural network architecture.
- Decoder
- H2O Hydrogen Torch applies a different learning rate to the decoder of the neural network architecture.
- Segmentation head
- H2O Hydrogen Torch applies a different learning rate to the head of the neural network architecture.
The options for an image object detection experiment are different based on the selected Model type (setting). Options:
If you select EfficientDet as the experiment's Model type (setting), the following options are available:
Options
- Backbone
- H2O Hydrogen Torch applies a different learning rate to a body of the EfficientDet architecture.
- FPN
- H2O Hydrogen Torch applies a different learning rate to a Feature Pyramid Network (FPN) block of the EfficientDet architecture.
- class_net
- H2O Hydrogen Torch applies a different learning rate to a classification head of the EfficientDet architecture.
- box_net
- H2O Hydrogen Torch applies a different learning rate to a box regression head of the EfficientDet architecture.
- Backbone
If you select Faster R-CNN as the experiment's Model type (setting), the following options are available:
Options
- Body
- H2O Hydrogen Torch applies a different learning rate to a body of the Faster R-CNN architecture.
- FPN
- H2O Hydrogen Torch applies a different learning rate to a Feature Pyramid Network (FPN) block in the Faster R-CNN architecture.
- RPN
- H2O Hydrogen Torch applies a different learning rate to a Region Proposal block of the Faster R-CNN architecture.
- ROI heads
- H2O Hydrogen Torch applies a different learning rate to the Faster R-CNN architecture proposal heads.
- Body
If you select FCOS as the experiment's Model type (setting), the following options are available:
Options
- Body
- H2O Hydrogen Torch applies a different learning rate to a body of the FCOS architecture.
- FPN
- H2O Hydrogen Torch applies a different learning rate to a Feature Pyramid Network (FPN) block of the FCOS architecture.
- classification_head
- H2O Hydrogen Torch applies a different learning rate to the classification head of the FCOS architecture.
- regression_head
- H2O Hydrogen Torch applies a different learning rate to a box regression head of the FCOS architecture.
- Body
A common strategy is to apply a lower learning rate to the backbone of a model for better convergence and training stability.
Different layers are available for different problem types.
Batch size
Grid search hyperparameter
Defines the number of training examples a mini-batch will use during an iteration of the training model to estimate the error gradient before updating the model weights. Batch size defines the batch size used per a single GPU.
During model training, the training data is packed into mini-batches of a fixed size.
Automatically adjust batch size
If this setting is turned On, H2O Hydrogen Torch will check whether the Batch size specified fits into the GPU memory. If a GPU out-of-memory (OOM) error occurs, H2O Hydrogen Torch will automatically decrease the Batch size by a factor of 2 units until it fits into the GPU memory or Batch size equals 1.
Drop last batch
H2O Hydrogen Torch drops the last incomplete batch during model training when this setting is turned On.
H2O Hydrogen Torch groups the train data into mini-batches of equal size during the training process, but the last batch can have fewer records than the others. Not dropping the last batch can lead to a less robust gradient estimation while causing a more volatile training step.
Epochs
Grid search hyperparameter
Defines the number of epochs to train the model. In other words, it specifies the number of times the learning algorithm will go through the entire training dataset.
- The Epochs setting is an important setting to tune because it balances under- and overfitting.
- The learning rate highly impacts the optimal value of the epochs.
Schedule
Grid search hyperparameter
Defines the learning rate schedule H2O Hydrogen Torch will use during model training. Specifying a learning rate schedule will prevent the learning rate from staying the same. Instead, a learning rate schedule will cause the learning rate to change over iterations, typically decreasing the learning rate to achieve a better model performance and training convergence.
Options
Constant
H2O Hydrogen Torch applies a constant learning rate during the training process.
Cosine
H2O Hydrogen Torch applies a cosine learning rate that follows the values of the cosine function.
Linear
H2O Hydrogen Torch applies a linear learning rate that decreases the learning rate linearly.
Warmup epochs
Defines the number of epochs to warm up the learning rate where the learning rate should increase linearly from 0 to the desired learning rate.
Weight decay
Defines the weight decay that H2O Hydrogen Torch will use for the optimizer during model training.
Weight decay is a regularization technique that adds an L2 norm of all model weights to the loss function while increasing the probability of improving the model generalization.
Gradient clip
Defines the maximum norm of the gradients H2O Hydrogen Torch specifies during model training. Defaults to 0, no clipping. When a value greater than 0 is specified, H2O Hydrogen Torch will modify the gradients during model training. H2O Hydrogen Torch uses the specified value as an upper limit for the norm of the gradients, calculated using the Euclidean norm over all gradients per batch.
This setting can help model convergence when extreme gradient values cause high volatility of weight updates.
Grad accumulation
Defines the number of gradient accumulations before H2O Hydrogen Torch updates the neural network weights during model training.
- Grad accumulation can be beneficial if only small batches are selected for training. With gradient accumulation, the loss and gradients are calculated after each batch, but it waits for the selected accumulations before updating the model weights. You can control the batch size through the Batch size setting.
- Changing the default value of Grad Accumulation might require adjusting the learning rate and batch size.
Save best checkpoint
Determines if H2O Hydrogen Torch should save the model weights of the epoch exhibiting the best validation metric. When turned On, H2O Hydrogen Torch saves the model weights for the epoch exhibiting the best validation metric. When turned Off, H2O Hydrogen Torch saves the model weights after the last epoch is executed.
- This setting should be turned On with care as it has the potential to lead to overfitting of the validation data.
- The default goal should be to attempt to tune models so that the last or very last epoch is the best epoch.
- Suppose an evident decline for later epochs is observed in logging. In that case, it is usually better to adjust hyperparameters, such as reducing the number of epochs or increasing regularization, instead of turning this setting On.
Evaluation epochs
Defines the number of epochs H2O Hydrogen Torch will use before each validation loop for model training. In other words, it determines the frequency (in a number of epochs) to run the model evaluation on the validation data.
- Increasing the number of Evaluation Epochs can speed up an experiment.
- The Evaluation epochs setting is available only if the following setting is turned Off: Save Best Checkpoint.
Calculate train metric
Determines whether the model metric should also be calculated for the training data at the end of the training. When On, the model metric will also be calculated for the training data. The resulting values will not indicate the true model performance because they will be based on H2O Hydrogen Torch's identical data records for model training but can give insights into over/underfitting.
Train validation data
Defines whether the model should use the entire train and validation dataset during model training. When turned On, H2O Hydrogen Torch will use the whole train dataset and validation data to train the model.
- H2O Hydrogen Torch will also evaluate the model on the provided validation fold. Validation will always be only on the provided validation fold.
- H2O Hydrogen Torch will use both datasets for model training if you provide a train and validation dataset.
- To define a training dataset, use the Train dataframe setting. For more information, see Train dataframe.
- To define a validation dataset, use the Validation dataframe setting. For more information, see Validation dataframe.
- The Train validation data setting is only available if you turned the Save best checkpoint setting Off.
- See Save best checkpoint to learn more about the Save best checkpoint setting.
- Turning On the Train validation data setting should produce a model that you can expect to perform better because H2O Hydrogen Torch trained the model on more data. Thought, also note that using the entire train dataset and out-of-fold validation dataset generally causes the model's accuracy to be overstated as information from the validation data is incorporated into the model during the training process. note
If you have five folds and set fold 0 as validation, H2O Hydrogen Torch will usually train on folds 1-4 and report on fold 0. With Train validation data turned On, we can add fold 0 to the training, but H2O Hydrogen Torch will still report its accuracy. As a result, it will be overstated for fold 0 but should be better for any unseen (test) data/production scenarios. For that reason, you usually want to consider this setting after running your experiments and deciding on models.
Run interpretations
Determines whether the experiment (model) generates validation interpretation insights at the end of the experiment. Validation interpretation insights are only available for image, text, and audio classification and regression experiments.
Build scoring pipelines
Determines whether the experiment (model) automatically generates an H2O MLOps pipeline and Python scoring pipeline at the end of the experiment. If turned Off, you can still create scoring pipelines on demand when the experiment is complete (e.g., when you click Download soring or Download MLOps).
Prediction settings
Metric
Defines the metric to use to evaluate the model's performance.
Usually, the evaluation metric should reflect the quantitative way of assessing the model's value for the corresponding use case.
Probability threshold
- Image instance segmentation | Image semantic segmentation
- Defines the probability threshold; a predicted pixel will be treated as positive if its probability is larger than the probability threshold.
- Image object detection
- Defines the probability threshold that the model utilizes to identify predicted bounding boxes with confidence larger than the defined probability threshold. Predicted bounding boxes above the defined probability threshold are added to the validation and test
.csv
files in the downloaded model predictions.zip
file.
- Defines the probability threshold that the model utilizes to identify predicted bounding boxes with confidence larger than the defined probability threshold. Predicted bounding boxes above the defined probability threshold are added to the validation and test
- Audio classification | Image classification | Text classification
- Define a threshold for threshold-dependent classification metrics (e.g. F1). For multi-class classification argmax will be used.note
The defined threshold is used as a default threshold when displaying all other threshold-dependent metrics in the validation plots.
- Define a threshold for threshold-dependent classification metrics (e.g. F1). For multi-class classification argmax will be used.
Environment settings
GPUs
Determines the list of GPUs H2O Hydrogen Torch can use for the experiment. GPUs are listed by name, referring to their system ID (starting from 1).
Number of GPUs per run
Defines the number of GPUs to use for a single run when training the model. A single run might represent a single fold or a single grid search run.
If 5 GPUs are available, it will be possible to run a 5-fold cross-validation in parallel using a single GPU per fold.
- The available GPUs will be the ones that can be enabled using the GPUs setting.
- If the number of GPUs is less than or equal to 1, this setting (Number of GPUs per run ) will not be available.
Mixed precision training
Determines whether to use mixed-precision during model training. When turned Off, H2O Hydrogen Torch will not use mixed-precision for training.
Mixed-precision is a technique that helps decrease memory consumption and increases training speed.
Mixed precision inference
Determines whether to use mixed-precision during model inference.
Mixed-precision is a technique that helps decrease memory consumption and increases inference speed.
Sync batch normalization
Determines whether to synchronize batch normalization across GPUs in a distributed data-parallel (DDP) mode. In other words, when turned On, multi-GPU training is enabled to synchronize the batch normalization layers of the model across GPUs. In a nutshell, H2O Hydrogen Torch with multi GPU splits the batch across GPUs, and therefore, when a normalization layer wants to normalize data, it has access only to the part of the batch stored on the device. As a result, it will work out of the box but will give better results if the data in all GPUs is collected to normalize the data of the entire batch.
When turned On, data scientists can expect the training speed to drop slightly while the model's accuracy improves. However, this rarely happens in practice and only occurs under specific problem types and defined batch sizes.
Number of workers
Defines the number of workers H2O Hydrogen Torch will use for the DataLoader. In other words, it defines the number of CPU processes to use when reading and loading data to GPUs during model training.
Seed
Defines the random seed value that H2O Hydrogen Torch will use during model training. It defaults to -1, an arbitrary value. When the value is modified (not -1), the random seed will allow results to be reproducible—defining a seed aids in obtaining predictable and repeatable results every time. Otherwise, not modifying the default seed value (-1) will lead to random numbers at every invocation.
Logging settings
Logger
Defines the logger type that H2O Hydrogen Torch will use for model training
Options
None
H2O Hydrogen Torch does not use any logger.
Neptune
H2O Hydrogen Torch will use Neptune as a logger to track the experiment. To use Neptune, you must specify a Neptune API token and a Neptune project.
Number of texts
This setting defines the number of texts to show in the experiment Insights tab.
- Submit and view feedback for this page
- Send feedback about H2O Hydrogen Torch to cloud-feedback@h2o.ai