Version: v1.6.0

Experiment settings: Speech recognition

The settings for a speech recognition experiment are listed and described below.

General settings

Dataset

This setting defines the dataset for the experiment.

Problem category

This setting defines a particular general problem type category, for example, image.

note

The selected problem category (for example, image) determines the options in the Problem type setting.
The From experiment option enables you to utilize the settings of an experiment (another experiment).
- The From experiment option is unavailable when you select AutoDL as the experience level.

Experiment

This setting defines the experiment H2O Hydrogen Torch references to initialize the experiment settings. H2O Hydrogen Torch initializes the experiment settings with the values from the selected (built) experiment.

Setting dependency

This setting is available only if From experiment is selected in the Problem category setting.

Problem type

This setting defines the problem type of the experiment, which also defines the settings H2O Hydrogen Torch displays for the experiment.

Note

The selected problem category (in the Problem category setting) determines the available problem types.
The selected problem type and experience level determine the settings H2O Hydrogen Torch displays for the experiment.

Model type

It defines the model type for the experiment.

Note

The selected problem type, experience level, and model type determine the settings H2O Hydrogen Torch displays for the experiment.
Not all problem types support the specification of a model type; in that case, the selected problem type and experience level determine the settings H2O Hydrogen Torch displays for the experiment.

Details

Options

Details

Image object detection

Efficientdet
- EfficientDet models are among the most popular models to tackle image object detection. They are using EfficientNet models as a backbone and a weighted bi-directional feature pyramid network (BiFPN) as the feature network.
note
EfficientDet is the default model type for image object detection in H2O Hydrogen Torch. To learn more about EfficientDet, see EfficientDet: Scalable and efficient object detection.
Faster Rcnn
- Faster Region-based Convolutional Neural Networks (FasterRCNN) is an advancement of classical Region-based Convolutional Neural Networks (RCNN) architectures, so-called region-based convolutional neural networks. The core idea is to apply selective search to extract regions of interest from an image, where each ROI might represent a bounding box of an object. Each region of interest (ROI) is fed through a neural network to produce output features used to classify the type of object. A FasterRCNN shares full-image convolutional features with the detection network and thus enables nearly cost-free region proposals, significantly improving the training and inference process compared to classical RCNN or Fast RCNN networks.
note
The implementation of FasterRCNNs in H2O Hydrogen Torch enables the selection of a pre-trained vision backbone from an extensive selection. To learn more about FasterRCNN, see Faster R-CNN: Towards real-time object detection with Region Proposal Networks.
Fcos
- Both EfficientDet and FasterRCNN are so-called anchor-based object detection models. A fully convolutional one-stage object detector (FCOS) is a fully convolutional one-stage object detector to solve object detection per pixel. Similar to how semantic segmentation models operate. FOCS is anchor box and proposal free.
note
The implementation of FCOS in H2O Hydrogen Torch enables the selection of a pre-trained vision backbone from an extensive selection. To learn more about FCOS, see FCOS: Fully Convolutional One-Stage Object Detection.

Details

Speech recognition

Wav2vec2
- Wav2vec2 models in H2O Hydrogen Torch leverage a) the transformer encoder architecture and b) the connectionist temporal classification (CTC) loss to learn and perform speech recognition.
  - Raw audio waveforms are typically converted into spectrograms then featurized by a convolutional neural network (CNN) before being fed to the transformer model.
- Wav2vec2 models can be characterized as making predictions tightly coupled with input audio due to the use of the CTC loss and vocabularies at the character level.
note
To learn more about Wav2vec2, see wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations.
Whisper
- Whisper models in H2O Hydrogen Torch use a) the transformer encoder-decoder architecture and b) a cross-entropy loss to learn and perform speech recognition.
  - Raw audio waveforms are typically converted into spectrograms then featurized by a convolutional neural network (CNN) before being fed to the transformer model.
- Whisper models are auto-regressive generative models. While pre-trained Whisper models perform very strongly due to the size of their pre-training corpus, their generative nature may cause them to hallucinate (that is, predict speech that was not spoken) or more commonly, not predict speech when it is spoken.
note
To learn more about the Whisper architecture, see Robust Speech Recognition via Large-Scale Weak Supervision.

Wav2vec2

Wav2vec2 models in H2O Hydrogen Torch leverage the transformer encoder architecture and the connectionist temporal classification (CTC) loss to learn and perform speech recognition.

Import config from YAML

This setting defines the YML file that defines the experiment settings.

Note

H2O Hydrogen Torch supports a YML file import and export functionality. You can download the config settings of finished experiments, make changes, and re-upload them when starting a new experiment in any instance of H2O Hydrogen Torch.
- To learn how to download the YML file (configuration file) of a completed experiment, see Download an experiment's logs/config file.

Use previous experiment weights

This setting determines whether to initialize the model weights with the weights from the experiment specified in the Experiment setting.

note

A model's weights are available for an experiment (model) of the same problem type and backbone.

tip

This setting might be useful in case you want to continue training from a built experiment.

Setting dependency

The Use previous experiment weights setting is available only if From experiment is selected in the Problem category setting.

Experiment name

This setting defines the name of the experiment.

Dataset settings

Train dataframe

This setting specifies the path to a file that contains a dataframe comprising training records utilized by H2O Hydrogen Torch for model training within the experiment. Here, the term 'file' denotes a specific file adhering to a dataset format tailored for the problem type addressed in the experiment. To learn more, see Dataset formats.

note

The records are combined into mini-batches when training the model.
If a validation dataframe is provided, a fold column is not needed in the train dataframe.
To import datasets for inference only, when defining the settings for an experiment, set the Train dataframe setting to None while setting the Test dataframe setting to the relevant dataframe (as a result, H2O Hydrogen Torch utilizes the relevant dataset for predictions and not for training).

Data folder

Defines the location of the folder containing assets (for example, images or audio clips) the model utilizes for training. H2O Hydrogen Torch loads assets from this folder during training.

Validation strategy

This setting specifies the validation strategy H2O Hydrogen Torch uses for the experiment.

tip

To properly assess the performance of your trained models, it is common practice to evaluate it on separate holdout data that the model has not seen during training.

Details

Options

K-fold cross validation
- This option splits the data using the provided optional fold column in the train data or performs an automatic 5-fold cross-validation in the absence of a fold column.
Grouped k-fold cross-validation
- This option allows you to specify a group column based on which the data is split into folds.
Custom holdout validation
- This option specifies a separate holdout dataframe.
Automatic holdout validation
- This option allows you to specify a holdout validation sample size that is automatically generated.

Validation dataframe

This setting defines a file containing a dataframe with validation records that H2O Hydrogen Torch uses to evaluate the model during training.

Note

To set a Validation dataframe requires the Validation strategy to be set to Custom holdout validation. In the case of providing a validation dataframe, H2O Hydrogen Torch fully respects the choice of a separate validation dataframe and does not perform any internal cross-validation. In other words, the model is trained on the full provided train dataframe, and model performance is evaluated on the provided validation dataframe.
The validation dataframe should have the same format as the train dataframe but does not require a fold column.

Setting dependency

The Validation dataframe settings is only available when you select Validation strategy in the Custom holdout validation setting.

Selected folds

This setting defines the selected validation fold(s) in case of cross-validation; a separate model is trained for each value selected. Each model utilizes the corresponding part of the data as a holdout sample to assess performance while the model is fitted to the rest of the records from the training dataframe. As a result, folds estimate how the model performs in general when used to make predictions on data not used during model training.

Note

H2O Hydrogen Torch allows running experiments on a single selected fold for faster experimenting and multiple selected folds to gain more trust in the model's generalization and performance capabilities.

Setting dependency

This setting is available only when the Validation strategy setting is not set to Custom holdout validation or Automatic holdout validation.

Test dataframe

This setting defines a file containing a dataframe with test records that H2O Hydrogen Torch uses to test the model.

note

The test dataframe should have the same format as the train dataframe but does not require a label column.
To import datasets for inference only, when defining the setting for an experiment, set the Train dataframe setting to None while setting the Test dataframe setting to the relevant dataframe (as a result, H2O Hydrogen Torch utilizes the relevant dataset for predictions and not for training).

Data folder test

Defines the location of the folder containing assets (for example, images, texts, or audio clips) H2O Hydrogen Torch utilizes to test the model. H2O Hydrogen Torch loads the assets from this folder when testing the model. This setting is only available if a test dataframe is selected.

Setting dependency

This setting is only available if a test dataframe is selected.
The Data folder test setting appears when you specify a test dataframe in the Test dataframe setting.

Label columns

This setting defines the name(s) of the dataframe column(s) that refer to the target value(s) an H2O Hydrogen Torch experiment can aim to predict.

Audio column

Defines the dataframe column storing the names of audios that H2O Hydrogen Torch loads from the Data folder and Data folder test when training and testing the model.

Data sample

This setting defines the percentage of the data to use for the experiment. The default percentage is 100%.

tip

Changing the default value can significantly increase the training speed. Still, it might lead to a substantially poor accuracy value. Using 100% of the data for final models is highly recommended.

Data sample choice

This setting specifies the data H2O Hydrogen Torch samples according to the percentage set in the Data sample setting. H2O Hydrogen Torch does not sample the unselected data.

Setting dependency

The Data sample choice setting is only available if the value in the Data sample setting is less than 1.0.

Capitalization

Defines the letter case for the label transcriptions H2O Hydrogen Torch utilizes for the model.

Details

Options

Details

Speech recognition

lower
- H2O Hydrogen Torch converts the text to lowercase.
upper
- H2O Hydrogen Torch converts the text to uppercase.
leave as-is
- H2O Hydrogen Torch does not convert the text to either uppercase or lowercase (it is left as-is).

Audio settings

Sample rate

Defines the sample rate (Hz) that H2O Hydrogen Torch utilizes to resample the audio files to a given sample rate for training and inference (validation and prediction). This setting becomes useful when audio files in the dataset have mixed samples (22kHz, 32kHz, 44kHz, etc.).

Note

Resampling the audio files to a common sample rate can result in a faster training
The Sample Rate setting is only available if Manual is selected in the Audio Parameters setting
The Auto option selects the most common sample rate from the training set
Speech recognition
- 16000hz is a good default setting and most contemporary speech architectures are pretrained against it

Architecture settings

Pretrained

Grid search hyperparameter

Defines whether the neural network should start with pre-trained weights. When this setting is On, the training of the neural network starts with a pre-trained model on a generic task. When turned Off, the initial weights of the neural network to train become random.

Backbone

Grid search hyperparameter

Defines the backbone neural network architecture to train the model.

Note

Image regression | Image classification | Image metric learning | Audio regression | Audio classification
- H2O Hydrogen Torch accepts backbone neural network architectures from the timm library (select or enter the architecture name)
Image object detection
- H2O Hydrogen Torch provides several backbone state-of-the-art neural network architectures for model training. When you select Faster RCnn or Fcos as the model type for the experiment, you can input any architecture name from the timm library. When you select Efficientdet as the model type for the experiment, you can input any architecture name from the efficientdet-pytorch library
Image semantic segmentation | Image instance segmentation
- H2O Hydrogen Torch accepts backbone neural network architectures from the segmentation-models-pytorch library (select or enter the architecture name).
3D image regression | 3D image classification
- H2O Hydrogen Torch accepts backbone (encoder) neural network architectures from a subset (resnet and efficientnet) of the timm library (select or enter the architecture name).
Text regression | Text classification | Text token classification | Text span prediction | Text sequence to sequence | Text metric learning
- H2O Hydrogen Torch accepts backbone neural network architectures from the Hugging Face library (select or enter the architecture name)
Speech recognition
- HuggingFace Wav2Vec2 CTC models are supported

tip

All problem types
- Usually, it is good to use simpler architectures for quicker experiments and larger models when aiming for the highest accuracy
Speech recognition
- If possible, leverage backbones pre-trained closely to your use case (for example, noisy audio, casual speech, etc.)

Language

Defines the two-letter language code specifying the language to transcribe.

Note

This setting is available when Whisper is selected as the model type for the experiment
You can find a list of supported languages for mainline Whisper pre-trained models here

tip

Selecting a different language from the language of the speech (audio) typically leads to the model performing machine translation.

Freeze feature encoder

Grid search hyperparameter

Defines whether H2O Hydrogen Torch fixes the weights of the convolutional neural network (CNN) feature encoder during training.

Note

This setting is available when Wav2vec2 is selected as the model type for the experiment.

tip

Freezing the encoder may be helpful if you're working with pretrained weights and are fine-tuning with a small dataset.

Freeze encoder

Grid search hyperparameter

Defines whether H2O Hydrogen Torch fixes the weights of the transformer encoder during training.

Note

This setting is available when Whisper is selected as the model type for the experiment.

tip

Freezing the feature encoder may be helpful if you're working with pretrained weights and are fine-tuning with a small dataset.

Gradient checkpointing

Determines whether H2O Hydrogen Torch activates gradient checkpointing (GC) when training the model. Starting GC reduces the video random access memory (VRAM) footprint at the cost of a longer runtime (an additional forward pass). Turning On GC enables it during the training process.

caution

Gradient checkpointing is an experimental setting that is not compatible with all backbones. If a backbone is not supported, the experiment fails, and H2O Hydrogen Torch informs through the logs that the selected backbone is not compatible with gradient checkpointing. To learn about the backbone setting, see Backbone.

tip

Activating GC comes at the cost of a longer training time; for that reason, try training without GC first and only activate when experiencing GPU out-of-memory (OOM) errors.

Training settings

Loss function

Grid search hyperparameter

This setting defines the loss function H2O Hydrogen Torch utilizes during model training. The loss function is a differentiable function measuring the prediction error. The model utilizes gradients of the loss function to update the model weights during training.

Details

Options

Details

Image regression | 3D image regression | Text regression | Audio regression

MAE
- H2O Hydrogen Torch utilizes the mean absolute error (L1 norm) as the loss function.
MSE
- H2O Hydrogen Torch utilizes the mean squared error (squared L2 norm) as the loss function.
RMSE
- H2O Hydrogen Torch utilizes the mean squared error (L2 norm) as a loss function.

Details

Image classification | 3D image classification | Text classification | Audio classification

BCE
- H2O Hydrogen Torch uses binary cross entropy loss.
Classification
- This default classification loss automatically chooses between BCE (multi-label) and CrossEntropy (multi-class) for classification.
CrossEntropy
- H2O Hydrogen Torch utilizes multi-class cross entropy loss as a loss function.
SigmoidFocal
- H2O Hydrogen Torch uses the sigmoid Focal loss (gamma=2.0) for classification introduced in the following paper: Focal Loss for Dense Object Detection
SoftmaxFocal
- H2O Hydrogen Torch uses the softmax Focal loss (gamma=2.0) for classification introduced in the following paper: Focal Loss for Dense Object Detection

Details

Image semantic segmentation | 3D image semantic segmentation | Image instance segmentation

BCE
- H2O Hydrogen Torch uses binary cross entropy loss.
BCEDice
- H2O Hydrogen Torch uses binary cross entropy loss and Dice loss weights 2 and 1, respectively.
BCELovasz
- H2O Hydrogen Torch uses binary cross entropy loss and Lovasz loss with equal weights.
Dice
- H2O Hydrogen Torch uses Dice loss.
Focal
- H2O Hydrogen Torch uses the Focal loss for semantic segmentation introduced in the following paper: Focal Loss for Dense Object Detection
FocalDice
- H2O Hydrogen Torch uses Focal loss and Dice loss with weights 2 and 1, respectively.
Jaccard
- H2O Hydrogen Torch uses Jaccard loss.

Details

Image metric learning | Text metric learning

ArcFace
- H2O Hydrogen Torch utilizes an Additive Angular Margin Loss for Deep Face Recognition (ArcFace).
CrossEntropy
- H2O Hydrogen Torch utilizes multi-class cross entropy loss as a loss function.

Details

Text token classification | Text span prediction | Text sequence to sequence

CrossEntropy
- H2O Hydrogen Torch utilizes multi-class cross entropy loss as a loss function.

Details

Speech recognition

CTC Loss
- H2O Hydrogen Torch utilizes Conectionist Temporal Classification loss as a loss function.

CTC loss reduction

Grid search hyperparameter

Defines how to aggregate individual sample connectionist temporal classification (CTC) losses.

Details

Options

Details

Speech recognition

Mean
- The individual sample CTC losses are first normalized by label (i.e., transcript) lengths, then averaged across the batch.
- Selecting this option may improve performance on audio datasets where label lengths vary heavily.
Sum
- The individual sample CTC losses are summed up.

Note

This setting is available when Wav2vec2 is selected as the model type for the experiment.

CTC zero infinity

Grid search hyperparameter

The calculated connectionist temporal classification (CTC) losses may include infinite elements (and gradients) - especially when inputs are too short to be aligned with the targets. If turned On, infinite losses and associated gradients are set to zero.

By default, this setting is turned Off as a diagnostic to flag dataset issues. In other words, when turned Off, a poorly processed dataset (that is, an extreme mismatch between audio and transcript), training a model against the dataset might raise errors while suggesting a dataset issue.

Note

This setting is available when Wav2vec2 is selected as the model type for the experiment.

Apply spec augment

Grid search hyperparameter

Defines whether to apply the SpecAugment data augmentation to the outputs of the feature encoder.

Note

This setting is available when Wav2vec2 is selected as the model type for the experiment.

LayerDrop

Grid search hyperparameter

Defines a layer's probability of being dropped (skipped) during training. The closer to 1, the more layers are dropped. To learn more, see Reducing Transformer Depth on Demand with Structured Dropout.

Note

This setting is available when Wav2vec2 is selected as the model type for the experiment.

Optimizer

Grid search hyperparameter

This setting defines the algorithm or method (optimizer) to use for model training. The selected algorithm or method defines how the model should change the attributes of the neural network, such as weights and learning rate. Optimizers solve optimization problems and make more accurate updates to attributes to reduce learning losses.

Details

Options

Adadelta
- To learn about Adadelta, see ADADELTA: An Adaptive Learning Rate Method.
Adam
- To learn about Adam, see Adam: A Method for Stochastic Optimization.
AdamW
- To learn about AdamW, see Decoupled Weight Decay Regularization.
RMSprop
- To learn about RMSprop, see Neural Networks for Machine Learning.
SGD
- H2O Hydrogen Torch uses a stochastic gradient descent optimizer.

Learning rate

Grid search hyperparameter

This setting defines the learning rate H2O Hydrogen Torch uses when training the model, specifically when updating the neural network's weights. The learning rate is the speed at which the model updates its weights after processing each mini-batch of data.

note

The learning rate is an important setting to tune as it balances under and overfitting.
The number of epochs highly impacts the optimal value of the learning rate.

Batch size

Grid search hyperparameter

This setting defines the number of training examples a mini-batch uses during an iteration of the training model to estimate the error gradient before updating the model weights. In other words, this setting defines the batch size used per GPU.

note

During model training, the training data is packed into mini-batches of a fixed size.

Automatically adjust batch size

If this setting is turned On, H2O Hydrogen Torch checks whether the Batch size specified fits into the GPU memory. If a GPU out-of-memory (OOM) error occurs, H2O Hydrogen Torch automatically decreases the Batch size by a factor of 2 units until it fits into the GPU memory or Batch size equals 1.

Drop last batch

This setting drops the last incomplete batch during model training when turned On.

note

H2O Hydrogen Torch groups the train data into mini-batches of equal size during the training process, but the last batch can have fewer records than the others. Not dropping the last batch can lead to a less robust gradient estimation while causing a more volatile training step.

Epochs

Grid search hyperparameter

This setting defines the number of epochs to train the model. In other words, it specifies the number of times the learning algorithm goes through the entire training dataset.

note

The Epochs setting is an important setting to tune because it balances under- and overfitting.
The learning rate highly impacts the optimal value of the epochs.
For the following supported problem types, H2O Hydrogen Torch now enables you to utilize/deploy a pre-trained model trained on zero epochs (where H2O Hydrogen Torch does not train the model and the pretrained model (experiment) can be deployed as-is):
- Speech recognition
- Text sequence to sequence
- text span prediction

Schedule

Grid search hyperparameter

This setting defines the learning rate schedule H2O Hydrogen Torch utilizes during model training. Specifying a learning rate schedule prevents the learning rate from staying the same. Instead, a learning rate schedule causes the learning rate to change over iterations, typically decreasing the learning rate to achieve a better model performance and training convergence.

Details

Options

Constant
- H2O Hydrogen Torch applies a constant learning rate during the training process.
Cosine
- H2O Hydrogen Torch applies a cosine learning rate that follows the values of the cosine function.
Linear
- H2O Hydrogen Torch applies a linear learning rate that decreases the learning rate linearly.

Warmup epochs

Grid search hyperparameter

This setting determines the number of epochs to warmup for gradually increasing the learning rate from 0 to the specified value. The learning rate increases linearly during the warmup period, allowing the model to adapt to the learning process gradually.

Note

You can set the value of this setting as a ratio of an epoch. For instance, setting it to 0.1 means warmup is performed for only 10% of the first full epoch.

Weight decay

Grid search hyperparameter

This setting defines the weight decay that H2O Hydrogen Torch uses for the optimizer during model training.

note

Weight decay is a regularization technique that adds an L2 norm of all model weights to the loss function while increasing the probability of improving the model generalization.

Gradient clip

Grid search hyperparameter

This setting defines the maximum norm of the gradients H2O Hydrogen Torch specifies during model training. Defaults to 0, no clipping. When a value greater than 0 is specified, H2O Hydrogen Torch modifies the gradients during model training. H2O Hydrogen Torch uses the specified value as an upper limit for the norm of the gradients, calculated using the Euclidean norm over all gradients per batch.

note

This setting can help model convergence when extreme gradient values cause high volatility of weight updates.

Grad accumulation

Grid search hyperparameter

This setting defines the number of gradient accumulations before H2O Hydrogen Torch updates the neural network weights during model training.

note

Grad accumulation can be beneficial if only small batches are selected for training. With gradient accumulation, the loss and gradients are calculated after each batch, but it waits for the selected accumulations before updating the model weights. You can control the batch size through the Batch size setting.
Changing the default value of Grad Accumulation might require adjusting the learning rate and batch size.

Save best checkpoint

This setting determines if H2O Hydrogen Torch should save the model weights of the epoch exhibiting the best validation metric. When turned On, H2O Hydrogen Torch saves the model weights for the epoch exhibiting the best validation metric. When turned Off, H2O Hydrogen Torch saves the model weights after the last epoch is executed.

note

This setting should be turned On with care as it has the potential to lead to overfitting of the validation data.
The default goal should be to attempt to tune models so that the last or very last epoch is the best epoch.
Suppose an evident decline for later epochs is observed in logging. In that case, it is usually better to adjust hyperparameters, such as reducing the number of epochs or increasing regularization, instead of turning this setting On.

Evaluation epochs

This setting defines the number of epochs H2O Hydrogen Torch uses before each validation loop for model training. In other words, it determines the frequency (in a number of epochs) to run the model evaluation on the validation data.

note

Increasing the number of Evaluation Epochs can speed up an experiment.

Setting dependency

The Evaluation epochs setting is available only if the following setting is turned Off: Save Best Checkpoint.

Evaluate before training

Determines whether to perform a validation run before training. This setting is potentially helpful for assessing the performance of zero-shot pertained backbones and checking the modeling pipeline.

note

The following supported problem types support externally pretrained zero-shot models (while problem types that do not contain this support fit a new head on top of a backbone):

Text span prediction
Text sequence to sequence
Speech recognition

Calculate train metric

This setting determines whether the model metric should also be calculated for the training data at the end of the training. When On, the model metric is calculated for the training data. The resulting values do not indicate the true model performance because they are based on H2O Hydrogen Torch's identical data records for model training but can give insights into over/underfitting.

Train validation data

This setting defines whether the model should use the entire train and validation dataset during model training. When turned On, H2O Hydrogen Torch uses the whole train dataset and validation data to train the model.

note

H2O Hydrogen Torch also evaluates the model on the provided validation fold. Validation is always only on the provided validation fold.
H2O Hydrogen Torch uses both datasets for model training if you provide a train and validation dataset.
- To define a training dataset, use the Train dataframe setting. For more information, see Train dataframe.
- To define a validation dataset, use the Validation dataframe setting. For more information, see Validation dataframe.
Turning On the Train validation data setting should produce a model that you can expect to perform better because H2O Hydrogen Torch trained the model on more data. Thought, also note that using the entire train dataset and out-of-fold validation dataset generally causes the model's accuracy to be overstated as information from the validation data is incorporated into the model during the training process.
note
If you have five folds and set fold 0 as validation, H2O Hydrogen Torch usually trains on folds 1-4 and reports on fold 0. With Train validation data turned On, we can add fold 0 to the training, but H2O Hydrogen Torch still reports its accuracy. As a result, it overstated for fold 0 but should be better for any unseen (test) data/production scenarios. For that reason, you usually want to consider this setting after running your experiments and deciding on models.

:::

note

This setting is only available if you turned the Save best checkpoint setting Off.

Build scoring pipelines

Determines whether the experiment (model) automatically generates an H2O MLOps pipeline and Python scoring pipeline at the end of the experiment. If turned Off, you can still create scoring pipelines on demand when the experiment is complete (e.g., when you click Download soring or Download MLOps).

Prediction settings

Metric

This setting defines the metric to evaluate the model's performance.

Batch size inference

This setting defines the batch size of examples to utilize for inference.

note

Selecting 0 sets the Batch size inference to the same value used for the Batch size setting.

Normalize text

Determines whether to normalize the label and prediction transcripts when scoring the experiment. This setting does not change the text a model utilizes for training. Before converting the text to lowercase, H2O Hydrogen Torch removes lead/trailing whitespaces.

Duration in visualizations

Defines the maximum audio duration (in seconds) H2O Hydrogen Torch utilizes for audio rendered in the visualizations page.

Note

Setting the duration too high for long audio datasets may lead to the visualization page failing.

Suppress default tokens

Determines whether to suppress (not generate) certain tokens in the text generation process (speech to text).

note

For pretrained Whisper models, these default tokens typically include non-speech predictions (for example, "[RADIO]", "[Laughter]", etc.) that are an artifact of the noisy pre-training data.
Suppressing default tokens is not just a simple deletion of text but potentially alters the text generation process (speech-to-text). For example, a Whisper model suppressing its default tokens (certain tokens like non-words and punctuation (for example, "[RADIO]")) resulted in altered predictions.

Chunk time

Specifies the audio length (seconds) H2O Hydrogen Torch accepts for the experiment. H2O Hydrogen Torch splits audios longer than the specified audio length into chunks where the chunk length is based on the defined audio length.

note

The text predictions are stitched back together in the final prediction
Most models should be able to infer 60-second audio samples within 16GB VRAM (that is, the default Chunk time)
Text predictions may vary as this setting varies due to the underlying transformer architecture seeing different amounts of context in a single chunk

Environment settings

GPUs

This setting determines the list of GPUs H2O Hydrogen Torch can use for the experiment. GPUs are listed by name, referring to their system ID (starting from 1). If no GPUs are selected, H2O Hydrogen Torch utilizes the CPU for model training.

Number of seeds per run

This setting defines the number of seeds to use for a single run. If more than one seed is selected, each experiment runs multiple times.

note

Deep learning models can sometimes exhibit certain randomness in individual runs. Running an experiment multiple times with multiple seeds, can give insights into stability of results.
In case of high randomness, better judgement can be made about the performance of a model with certain hyperparameter settings, by comparing the average results across seeds, for example in a grid search scenario.

Mixed precision training

Determines whether to use mixed-precision during model training. When turned Off, H2O Hydrogen Torch does not use mixed-precision for training.

Note

Mixed-precision is a technique that helps decrease memory consumption and increases training speed.

Mixed precision inference

Determines whether to use mixed-precision during model inference.

note

Mixed-precision is a technique that helps decrease memory consumption and increases inference speed.

Number of workers

This setting defines the number of workers H2O Hydrogen Torch uses for the DataLoader. In other words, it defines the number of CPU processes to use when reading and loading data to GPUs during model training.

Seed

This setting defines the random seed value that H2O Hydrogen Torch uses during model training. It defaults to -1, an arbitrary value. When the value is modified (not -1), the random seed allows results to be reproducible—defining a seed aids in obtaining predictable and repeatable results every time. Otherwise, not modifying the default seed value (-1) leads to random numbers at every invocation.

Logging settings

Log grad norm

This setting determines whether to log the total grad norm before and after clipping.

note

This setting adds a small overhead during the experiment runtime but can help determine if the gradients are exploding or unstable.

tip

Turn this setting on if you suspect unstable gradients; as a result, you may then choose a value for the gradient clip to prevent exploding gradients.

Number of audios

This setting defines the number of audios to show in the experiment Insights tab.

AutoDL settings

Time budget

This setting specifies the number of experiments that H2O Hydrogen Torch will generate, each with different values for certain hyperparameters referred to as grid search hyperparameters.

Details

Options

1
- This option selects several values for certain grid search hyperparameters. Up to 10 child experiments are generated when you run (start) the parent experiment.
  - Image regression: Backbone options: "tf_efficientnetv2_b3", "resnet50"; Learning Rate options: 0.001, 0.0003; Epochs options: 5, 10. Image size is fixed at 224x224 pixels.
  - 3D image regression: Backbone options: "resnet18d", "tf_efficientnet_b0.ns_jft_in1k"; Learning Rate options: 0.001, 0.0003; Epochs options: 5, 10. Image size is fixed at 128x128x32 pixels.
  - Image classification:
  - 3D image classification:
  - Image object detection:
  - Image semantic segmentation:
  - 3D image semantic segmentation:
  - Image instance segmentation:
  - Image metric learning:
  - Text regression:
  - Text classification:
  - Text token classification:
  - Text span prediction:
  - Text sequence-to-sequence:
  - Text metric learning:
  - Image and text classification:
  - Audio regression:
  - Audio classification:
  - Speech recognition:
  - Multi-modal causal language modeling:
2
- This option selects several values for certain grid search hyperparameters. Up to 50 child experiments are generated when you run (start) the parent experiment.
  - Image regression: Backbone options: "tf_efficientnetv2_b3", "resnet50"; Learning Rate options: 0.001, 0.0003, 0.0001; Epochs options: 5, 10; Augmentation Strategy options: "Soft", "Medium". Image size is fixed at 224x224 pixels.
  - 3D image regression: Backbone options: "resnet18d", "tf_efficientnet_b1.ns_jft_in1k"; Learning Rate options: 0.001, 0.0003, 0.0001; Epochs options: 5, 10; Augmentation Strategy options: "Soft", "Medium". Image size is fixed at 128x128x128 pixels.
  - Image classification:
  - 3D image classification:
  - Image object detection:
  - Image semantic segmentation:
  - 3D image semantic segmentation:
  - Image instance segmentation:
  - Image metric learning:
  - Text regression:
  - Text classification:
  - Text token classification:
  - Text span prediction:
  - Text sequence-to-sequence:
  - Text metric learning:
  - Image and text classification:
  - Audio regression:
  - Audio classification:
  - Speech recognition:
  - Multi-modal causal language modeling:
3
- This option selects several values for certain grid search hyperparameters. When you run (start) the parent experiment, up to 100 child experiments are generated.
  - Image regression: Backbone options: "tf_efficientnetv2_b3", "resnet50", "eca_nfnet_l0",; Learning Rate options: 0.001, 0.0003, 0.0001; Epochs options: 5, 10; Augmentation Strategy: "Soft", "Medium"; Mix Augmentations options: "Disabled", "Mixup". Image size is fixed at 384x384 pixels.
  - 3D image regression: Backbone options: "resnet34d", "tf_efficientnet_b3.ns_jft_in1k"; Learning Rate options: 0.001, 0.0003, 0.0001; Epochs options: 10, 20; Augmentation Strategy options: "Soft", "Medium"; Mix Augmentations options: "Disabled", "Mixup". Image size is fixed at 256x256x128 pixels.
  - Image classification:
  - 3D image classification:
  - Image object detection:
  - Image semantic segmentation:
  - 3D image semantic segmentation:
  - Image instance segmentation:
  - Image metric learning:
  - Text regression:
  - Text classification:
  - Text token classification:
  - Text span prediction:
  - Text sequence-to-sequence:
  - Text metric learning:
  - Image and text classification:
  - Audio regression:
  - Audio classification:
  - Speech recognition:
  - Multi-modal causal language modeling:

note

This setting is only available if you select AutoDL as the experience level.

Feedback

Submit and view feedback for this page
Send feedback about H2O Hydrogen Torch to cloud-feedback@h2o.ai

General settings​

Dataset​

Problem category​

Experiment​

Problem type​

Model type​

Wav2vec2​

Import config from YAML​

Use previous experiment weights​

Experiment name​

Dataset settings​

Train dataframe​

Data folder​

Validation strategy​

Validation dataframe​

Selected folds​

Test dataframe​

Data folder test​

Label columns​

Audio column​

Data sample​

Data sample choice​

Capitalization​

Audio settings​

Sample rate​

Architecture settings​

Pretrained​

Backbone​

Language​

Freeze feature encoder​

Freeze encoder​

Gradient checkpointing​

Training settings​

Loss function​

CTC loss reduction​

CTC zero infinity​

Apply spec augment​

LayerDrop​

Optimizer​

Learning rate​

Batch size​

Automatically adjust batch size​

Drop last batch​

Epochs​

Schedule​

Warmup epochs​

Weight decay​

Gradient clip​

Grad accumulation​

Save best checkpoint​

Evaluation epochs​

Evaluate before training​

Calculate train metric​

Train validation data​

Build scoring pipelines​

Prediction settings​

Metric​

Batch size inference​

Normalize text​

Duration in visualizations​

Suppress default tokens​

Chunk time​

Environment settings​

GPUs​

Number of seeds per run​

Mixed precision training​

Mixed precision inference​

Number of workers​

Seed​

Logging settings​

Log grad norm​

Number of audios​

AutoDL settings​

Time budget​

General settings

Dataset

Problem category

Experiment

Problem type

Model type

Wav2vec2

Import config from YAML

Use previous experiment weights

Experiment name

Dataset settings

Train dataframe

Data folder

Validation strategy

Validation dataframe

Selected folds

Test dataframe

Data folder test

Label columns

Audio column

Data sample

Data sample choice

Capitalization

Audio settings

Sample rate

Architecture settings

Pretrained

Backbone

Language

Freeze feature encoder

Freeze encoder

Gradient checkpointing

Training settings

Loss function

CTC loss reduction

CTC zero infinity

Apply spec augment

LayerDrop

Optimizer

Learning rate

Batch size

Automatically adjust batch size

Drop last batch

Epochs

Schedule

Warmup epochs

Weight decay

Gradient clip

Grad accumulation

Save best checkpoint

Evaluation epochs

Evaluate before training

Calculate train metric

Train validation data

Build scoring pipelines

Prediction settings

Metric

Batch size inference

Normalize text

Duration in visualizations

Suppress default tokens

Chunk time

Environment settings

GPUs

Number of seeds per run

Mixed precision training

Mixed precision inference

Number of workers

Seed

Logging settings

Log grad norm

Number of audios

AutoDL settings

Time budget