Experiment settings: Graph node regression
The settings for a graph node regression experiment are listed and described below.
General settings
Dataset
This setting defines the dataset for the experiment.
Problem category
This setting defines a particular general problem type category, for example, image.
- The selected problem category (for example, image) determines the options in the Problem type setting.
- The From experiment option enables you to utilize the settings of an experiment (another experiment).
- The From experiment option is unavailable when you select AutoDL as the experience level.
 
Experiment
This setting defines the experiment H2O Hydrogen Torch references to initialize the experiment settings. H2O Hydrogen Torch initializes the experiment settings with the values from the selected (built) experiment.
This setting is available only if From experiment is selected in the Problem category setting.
Problem type
This setting defines the problem type of the experiment, which also defines the settings H2O Hydrogen Torch displays for the experiment.
- The selected problem category (in the Problem category setting) determines the available problem types.
- The selected problem type and experience level determine the settings H2O Hydrogen Torch displays for the experiment.
Import config from YAML
This setting defines the YML file that defines the experiment settings.
- H2O Hydrogen Torch supports a YML file import and export functionality. You can download the config settings of finished experiments, make changes, and re-upload them when starting a new experiment in any instance of H2O Hydrogen Torch.
- To learn how to download the YML file (configuration file) of a completed experiment, see Download an experiment's logs/config file.
 
Experiment name
This setting defines the name of the experiment.
Use previous experiment weights
This setting determines whether to initialize the model weights with the weights from the experiment specified in the Experiment setting.
A model's weights are available for an experiment (model) of the same problem type and backbone.
This setting might be useful in case you want to continue training from a built experiment.
The Use previous experiment weights setting is available only if From experiment is selected in the Problem category setting.
Dataset settings
Train dataframe
This setting specifies the path to a file that contains a dataframe comprising training records utilized by H2O Hydrogen Torch for model training within the experiment. Here, the term 'file' denotes a specific file adhering to a dataset format tailored for the problem type addressed in the experiment. To learn more, see Dataset formats.
- The records are combined into mini-batches when training the model.
- If a validation dataframe is provided, a fold column is not needed in the train dataframe.
- To import datasets for inference only, when defining the settings for an experiment, set the Train dataframe setting to None while setting the Test dataframe setting to the relevant dataframe (as a result, H2O Hydrogen Torch utilizes the relevant dataset for predictions and not for training).
Validation strategy
This setting specifies the validation strategy H2O Hydrogen Torch uses for the experiment.
To properly assess the performance of your trained models, it is common practice to evaluate it on separate holdout data that the model has not seen during training.
Details
Options
- K-fold cross validation
- This option splits the data using the provided optional fold column in the train data or performs an automatic 5-fold cross-validation in the absence of a fold column.
 
- Grouped k-fold cross-validation
- This option allows you to specify a group column based on which the data is split into folds.
 
- Custom holdout validation
- This option specifies a separate holdout dataframe.
 
- Automatic holdout validation
- This option allows you to specify a holdout validation sample size that is automatically generated.
 
Validation dataframe
This setting defines a file containing a dataframe with validation records that H2O Hydrogen Torch uses to evaluate the model during training.
- To set a Validation dataframe requires the Validation strategy to be set to Custom holdout validation. In the case of providing a validation dataframe, H2O Hydrogen Torch fully respects the choice of a separate validation dataframe and does not perform any internal cross-validation. In other words, the model is trained on the full provided train dataframe, and model performance is evaluated on the provided validation dataframe.
- The validation dataframe should have the same format as the train dataframe but does not require a fold column.
The Validation dataframe settings is only available when you select Validation strategy in the Custom holdout validation setting.
Use validation as test data
When this setting is enabled, it uses the validation data as the test data. As well, this setting allows you to utilize the validation folds as a test dataset and experiment with different inference settings for comparison purposes.
Group fold column
This setting defines an optional dataset column to run Group-K-fold on. In the case of Group-K-fold, unique elements of the group fold column are always within the same fold (the same group does not appear in two different folds).
- The Group fold column needs to contain at least five unique values; if it's the case, the group 5-fold is used for validation.
- The Group fold column setting can be helpful if you want to emulate model performance on unseen data groups, such as new customer data. In this case, you do not want to train your model on samples of the same customers you are also evaluating.
The Group fold column setting is only available if you select Grouped k-fold cross-validation in the Validation strategy setting.
Selected folds
This setting defines the selected validation fold(s) in case of cross-validation; a separate model is trained for each value selected. Each model utilizes the corresponding part of the data as a holdout sample to assess performance while the model is fitted to the rest of the records from the training dataframe. As a result, folds estimate how the model performs in general when used to make predictions on data not used during model training.
H2O Hydrogen Torch allows running experiments on a single selected fold for faster experimenting and multiple selected folds to gain more trust in the model's generalization and performance capabilities.
This setting is available only when the Validation strategy setting is not set to Custom holdout validation or Automatic holdout validation.
Test dataframe
This setting defines a file containing a dataframe with test records that H2O Hydrogen Torch uses to test the model.
- The test dataframe should have the same format as the train dataframe but does not require a label column.
- To import datasets for inference only, when defining the setting for an experiment, set the Train dataframe setting to None while setting the Test dataframe setting to the relevant dataframe (as a result, H2O Hydrogen Torch utilizes the relevant dataset for predictions and not for training).
Label columns
This setting defines the name(s) of the dataframe column(s) that refer to the target value(s) an H2O Hydrogen Torch experiment can aim to predict.
Node ID column
This setting defines the column name of the input node IDs in the graph.
Data sample
This setting defines the percentage of the data to use for the experiment. The default percentage is 100%.
Changing the default value can significantly increase the training speed. Still, it might lead to a substantially poor accuracy value. Using 100% of the data for final models is highly recommended.
Data sample choice
This setting specifies the data H2O Hydrogen Torch samples according to the percentage set in the Data sample setting. H2O Hydrogen Torch does not sample the unselected data.
The Data sample choice setting is only available if the value in the Data sample setting is less than 1.0.
Architecture settings
Hidden dims
Grid search hyperparameter
This setting defines the the number of hidden dimensions that is the width of the neural network.
In the context of a graph node regression model, the hidden dimension setting refers to the number of hidden layers or dimensions in the neural network that is used to predict the target variable. The hidden dimension setting is also known as the width of the neural network.
Num hops
Grid search hyperparameter
This setting defines the number of hops the experiment considers when updating the node features.
The number of hops setting in a graph node regression model refers to the maximum number of connections or steps that can be taken from one node to another during the prediction process. In a graph, nodes represent entities or data points, and edges represent the relationships or connections between these entities.
When making predictions using a graph node regression model, the algorithm typically considers the neighboring nodes of a target node to gather information and make an accurate prediction. The number of hops setting determines how many steps away from the target node the algorithm will explore to gather this information (that is, to update the node's features later).
Start with a slight hop number because the increase of complexity is quadratic.
Num hidden layers
Grid search hyperparameter
This setting defines the number of hidden layers that is the depth of the neural network.
- Increasing the number of hidden layers increases the representation capability of the neural network due to the extra nonlinearity. When the data is not large enough, the model with more hidden layers could be prone to overfitting the small training data.
- When the training set is sufficiently large, gradually increase the hidden layer with each new experiment and observe each validation result to best use the extensive training data.
Dropout
Grid search hyperparameter
This setting defines the dropout rate before the final fully connected layer that H2O Hydrogen Torch applies during model training. This setting defines the dropout rate between the backbone and neck of the model H2O Hydrogen Torch applies during model training. The dropout rate helps the model generalize better by randomly dropping a share of the neural network connections.
Training settings
Loss function
Grid search hyperparameter
This setting defines the loss function H2O Hydrogen Torch utilizes during model training. The loss function is a differentiable function measuring the prediction error. The model utilizes gradients of the loss function to update the model weights during training.
Details
Options
Details
Image regression | 3D image regression | Text regression | Audio regression
- MAE
- H2O Hydrogen Torch utilizes the mean absolute error (L1 norm) as the loss function.
 
- MSE
- H2O Hydrogen Torch utilizes the mean squared error (squared L2 norm) as the loss function.
 
- RMSE
- H2O Hydrogen Torch utilizes the mean squared error (L2 norm) as a loss function.
 
Details
Image classification | 3D image classification | Text classification | Audio classification
- BCE
- H2O Hydrogen Torch uses binary cross entropy loss.
 
- Classification
- This default classification loss automatically chooses between BCE (multi-label) and CrossEntropy (multi-class) for classification.
 
- CrossEntropy
- H2O Hydrogen Torch utilizes multi-class cross entropy loss as a loss function.
 
- SigmoidFocal
- H2O Hydrogen Torch uses the sigmoid Focal loss (gamma=2.0) for classification introduced in the following paper: Focal Loss for Dense Object Detection
 
- SoftmaxFocal
- H2O Hydrogen Torch uses the softmax Focal loss (gamma=2.0) for classification introduced in the following paper: Focal Loss for Dense Object Detection
 
Details
Image semantic segmentation | 3D image semantic segmentation | Image instance segmentation
- BCE
- H2O Hydrogen Torch uses binary cross entropy loss.
 
- BCEDice
- H2O Hydrogen Torch uses binary cross entropy loss and Dice loss weights 2 and 1, respectively.
 
- BCELovasz
- H2O Hydrogen Torch uses binary cross entropy loss and Lovasz loss with equal weights.
 
- Dice
- H2O Hydrogen Torch uses Dice loss.
 
- Focal
- H2O Hydrogen Torch uses the Focal loss for semantic segmentation introduced in the following paper: Focal Loss for Dense Object Detection
 
- FocalDice
- H2O Hydrogen Torch uses Focal loss and Dice loss with weights 2 and 1, respectively.
 
- Jaccard
- H2O Hydrogen Torch uses Jaccard loss.
 
Details
Image metric learning | Text metric learning
- ArcFace
- H2O Hydrogen Torch utilizes an Additive Angular Margin Loss for Deep Face Recognition (ArcFace).
 
- CrossEntropy
- H2O Hydrogen Torch utilizes multi-class cross entropy loss as a loss function.
 
Details
Text token classification | Text span prediction | Text sequence to sequence
- CrossEntropy
- H2O Hydrogen Torch utilizes multi-class cross entropy loss as a loss function.
 
Details
Speech recognition
- CTC Loss
- H2O Hydrogen Torch utilizes Conectionist Temporal Classification loss as a loss function.
 
Optimizer
Grid search hyperparameter
This setting defines the algorithm or method (optimizer) to use for model training. The selected algorithm or method defines how the model should change the attributes of the neural network, such as weights and learning rate. Optimizers solve optimization problems and make more accurate updates to attributes to reduce learning losses.
Details
Options
- Adadelta
- To learn about Adadelta, see ADADELTA: An Adaptive Learning Rate Method.
 
- Adam
- To learn about Adam, see Adam: A Method for Stochastic Optimization.
 
- AdamW
- To learn about AdamW, see Decoupled Weight Decay Regularization.
 
- RMSprop
- To learn about RMSprop, see Neural Networks for Machine Learning.
 
- SGD
- H2O Hydrogen Torch uses a stochastic gradient descent optimizer.
 
Learning rate
Grid search hyperparameter
This setting defines the learning rate H2O Hydrogen Torch uses when training the model, specifically when updating the neural network's weights. The learning rate is the speed at which the model updates its weights after processing each mini-batch of data.
- The learning rate is an important setting to tune as it balances under and overfitting.
- The number of epochs highly impacts the optimal value of the learning rate.
Neighbor sampling quantile
Grid search hyperparameter
This setting determines the upper bound of the number of neighbors selected for updating the node features. The value range is from 0 to 1. Internally, H2O Hydrogen Torch computes the upper bound from this quantile number.
Batch size
Grid search hyperparameter
This setting defines the number of training examples a mini-batch uses during an iteration of the training model to estimate the error gradient before updating the model weights. In other words, this setting defines the batch size used per GPU.
During model training, the training data is packed into mini-batches of a fixed size.
Drop last batch
This setting drops the last incomplete batch during model training when turned On.
H2O Hydrogen Torch groups the train data into mini-batches of equal size during the training process, but the last batch can have fewer records than the others. Not dropping the last batch can lead to a less robust gradient estimation while causing a more volatile training step.
Epochs
Grid search hyperparameter
This setting defines the number of epochs to train the model. In other words, it specifies the number of times the learning algorithm goes through the entire training dataset.
- The Epochs setting is an important setting to tune because it balances under- and overfitting.
- The learning rate highly impacts the optimal value of the epochs.
- For the following supported problem types, H2O Hydrogen Torch now enables you to utilize/deploy a pre-trained model trained on zero epochs (where H2O Hydrogen Torch does not train the model and the pretrained model (experiment) can be deployed as-is):
- Speech recognition
- Text sequence to sequence
- text span prediction
 
Schedule
Grid search hyperparameter
This setting defines the learning rate schedule H2O Hydrogen Torch utilizes during model training. Specifying a learning rate schedule prevents the learning rate from staying the same. Instead, a learning rate schedule causes the learning rate to change over iterations, typically decreasing the learning rate to achieve a better model performance and training convergence.
Details
Options
- Constant
- H2O Hydrogen Torch applies a constant learning rate during the training process.
 
- Cosine
- H2O Hydrogen Torch applies a cosine learning rate that follows the values of the cosine function.
 
- Linear
- H2O Hydrogen Torch applies a linear learning rate that decreases the learning rate linearly.
 
Weight decay
Grid search hyperparameter
This setting defines the weight decay that H2O Hydrogen Torch uses for the optimizer during model training.
Weight decay is a regularization technique that adds an L2 norm of all model weights to the loss function while increasing the probability of improving the model generalization.
Gradient clip
Grid search hyperparameter
This setting defines the maximum norm of the gradients H2O Hydrogen Torch specifies during model training. Defaults to 0, no clipping. When a value greater than 0 is specified, H2O Hydrogen Torch modifies the gradients during model training. H2O Hydrogen Torch uses the specified value as an upper limit for the norm of the gradients, calculated using the Euclidean norm over all gradients per batch.
This setting can help model convergence when extreme gradient values cause high volatility of weight updates.
Grad accumulation
Grid search hyperparameter
This setting defines the number of gradient accumulations before H2O Hydrogen Torch updates the neural network weights during model training.
- Grad accumulation can be beneficial if only small batches are selected for training. With gradient accumulation, the loss and gradients are calculated after each batch, but it waits for the selected accumulations before updating the model weights. You can control the batch size through the Batch size setting.
- Changing the default value of Grad Accumulation might require adjusting the learning rate and batch size.
Save best checkpoint
This setting determines if H2O Hydrogen Torch should save the model weights of the epoch exhibiting the best validation metric. When turned On, H2O Hydrogen Torch saves the model weights for the epoch exhibiting the best validation metric. When turned Off, H2O Hydrogen Torch saves the model weights after the last epoch is executed.
- This setting should be turned On with care as it has the potential to lead to overfitting of the validation data.
- The default goal should be to attempt to tune models so that the last or very last epoch is the best epoch.
- Suppose an evident decline for later epochs is observed in logging. In that case, it is usually better to adjust hyperparameters, such as reducing the number of epochs or increasing regularization, instead of turning this setting On.
Evaluation epochs
This setting defines the number of epochs H2O Hydrogen Torch uses before each validation loop for model training. In other words, it determines the frequency (in a number of epochs) to run the model evaluation on the validation data.
- Increasing the number of Evaluation Epochs can speed up an experiment.
The Evaluation epochs setting is available only if the following setting is turned Off: Save Best Checkpoint.
Calculate train metric
This setting determines whether the model metric should also be calculated for the training data at the end of the training. When On, the model metric is calculated for the training data. The resulting values do not indicate the true model performance because they are based on H2O Hydrogen Torch's identical data records for model training but can give insights into over/underfitting.
Train validation data
This setting defines whether the model should use the entire train and validation dataset during model training. When turned On, H2O Hydrogen Torch uses the whole train dataset and validation data to train the model.
- H2O Hydrogen Torch also evaluates the model on the provided validation fold. Validation is always only on the provided validation fold.
- H2O Hydrogen Torch uses both datasets for model training if you provide a train and validation dataset.
- To define a training dataset, use the Train dataframe setting. For more information, see Train dataframe.
- To define a validation dataset, use the Validation dataframe setting. For more information, see Validation dataframe.
 
- Turning On the Train validation data setting should produce a model that you can expect to perform better because H2O Hydrogen Torch trained the model on more data. Thought, also note that using the entire train dataset and out-of-fold validation dataset generally causes the model's accuracy to be overstated as information from the validation data is incorporated into the model during the training process.
noteIf you have five folds and set fold 0 as validation, H2O Hydrogen Torch usually trains on folds 1-4 and reports on fold 0. With Train validation data turned On, we can add fold 0 to the training, but H2O Hydrogen Torch still reports its accuracy. As a result, it overstated for fold 0 but should be better for any unseen (test) data/production scenarios. For that reason, you usually want to consider this setting after running your experiments and deciding on models. 
:::
This setting is only available if you turned the Save best checkpoint setting Off.
Prediction settings
Metric
This setting defines the metric to evaluate the model's performance.
Neighbor sampling quantile inference
Grid search hyperparameter
This setting determines the number of neighbors selected for updating the node features; for inference, this setting determines the upper bound.
- Tune this parameter several times because it directly affects the inference performance.
- The neighbor sampling quantile for inference could differ from the neighbor sampling quantile used for training. By default, consider setting it to the same value as the corresponding parameter during training. Setting this setting to a larger value generally improves the prediction performance at the cost of more memory usage and slower inference speed. However, setting this setting to a smaller value may result in severe degradation in performance.
Batch size inference
This setting defines the batch size of examples to utilize for inference.
Selecting 0 sets the Batch size inference to the same value used for the Batch size setting.
Environment settings
GPUs
This setting determines the list of GPUs H2O Hydrogen Torch can use for the experiment. GPUs are listed by name, referring to their system ID (starting from 1). If no GPUs are selected, H2O Hydrogen Torch utilizes the CPU for model training.
Number of seeds per run
This setting defines the number of seeds to use for a single run. If more than one seed is selected, each experiment runs multiple times.
- Deep learning models can sometimes exhibit certain randomness in individual runs. Running an experiment multiple times with multiple seeds, can give insights into stability of results.
- In case of high randomness, better judgement can be made about the performance of a model with certain hyperparameter settings, by comparing the average results across seeds, for example in a grid search scenario.
Number of GPUs per run
This setting defines the number of GPUs to use for a single run when training the model. A single run might represent a single fold, a single seed run or a single grid search run.
If 5 GPUs are available, it is possible to run a 5-fold cross-validation in parallel using a single GPU per fold.
- The available GPUs are the ones that can be enabled using the GPUs setting.
- If the number of GPUs is less than or equal to 1, this setting (Number of GPUs per run ) is not available.
Number of workers
This setting defines the number of workers H2O Hydrogen Torch uses for the DataLoader. In other words, it defines the number of CPU processes to use when reading and loading data to GPUs during model training.
Seed
This setting defines the random seed value that H2O Hydrogen Torch uses during model training. It defaults to -1, an arbitrary value. When the value is modified (not -1), the random seed allows results to be reproducible—defining a seed aids in obtaining predictable and repeatable results every time. Otherwise, not modifying the default seed value (-1) leads to random numbers at every invocation.
Logging settings
Logger
This setting defines the logger type that H2O Hydrogen Torch uses for model training
Details
Options
- None
- This option does does not use any logger.
 
- Neptune
- This option utilizes Neptune as a logger to track the experiment. To use Neptune, you must define the following settings: Neptune API token and Neptune project.
 
Neptune API token
This setting defines the Neptune API token to validate all subsequent Neptune API calls.
This setting is available if you select Neptune in the Logger setting.
Neptune project
This setting defines the Neptune project.
This setting is available if you select Neptune in the Logger setting.
Log grad norm
This setting determines whether to log the total grad norm before and after clipping.
This setting adds a small overhead during the experiment runtime but can help determine if the gradients are exploding or unstable.
Turn this setting on if you suspect unstable gradients; as a result, you may then choose a value for the gradient clip to prevent exploding gradients.
AutoDL settings
Time budget
This setting specifies the number of experiments that H2O Hydrogen Torch will generate, each with different values for certain hyperparameters referred to as grid search hyperparameters.
Details
Options
- 1
- This option selects several values for certain grid search hyperparameters. Up to 10 child experiments are generated when you run (start) the parent experiment.
- Image regression: Backbone options: "tf_efficientnetv2_b3", "resnet50"; Learning Rate options:0.001, 0.0003; Epochs options:5, 10. Image size is fixed at 224x224 pixels.
- 3D image regression: Backbone options: "resnet18d", "tf_efficientnet_b0_ns"; Learning Rate options:0.001, 0.0003; Epochs options:5, 10. Image size is fixed at 128x128x32 pixels.
- Image classification:
- 3D image classification:
- Image object detection:
- Image semantic segmentation:
- 3D image semantic segmentation:
- Image instance segmentation:
- Image metric learning:
- Text regression:
- Text classification:
- Text token classification:
- Text span prediction:
- Text sequence-to-sequence:
- Text metric learning:
- Image and text classification:
- Audio regression:
- Audio classification:
- Speech recognition:
- Graph node regression:
- Graph node classification:
- Multi-modal causal language modeling:
 
- Image regression: Backbone options: 
 
- This option selects several values for certain grid search hyperparameters. Up to 10 child experiments are generated when you run (start) the parent experiment.
- 2
- This option selects several values for certain grid search hyperparameters. Up to 50 child experiments are generated when you run (start) the parent experiment.
- Image regression: Backbone options: "tf_efficientnetv2_b3", "resnet50"; Learning Rate options:0.001, 0.0003, 0.0001; Epochs options:5, 10; Augmentation Strategy options:"Soft", "Medium". Image size is fixed at 224x224 pixels.
- 3D image regression: Backbone options: "resnet18d", "tf_efficientnet_b1_ns"; Learning Rate options:0.001, 0.0003, 0.0001; Epochs options:5, 10; Augmentation Strategy options:"Soft", "Medium". Image size is fixed at 128x128x128 pixels.
- Image classification:
- 3D image classification:
- Image object detection:
- Image semantic segmentation:
- 3D image semantic segmentation:
- Image instance segmentation:
- Image metric learning:
- Text regression:
- Text classification:
- Text token classification:
- Text span prediction:
- Text sequence-to-sequence:
- Text metric learning:
- Image and text classification:
- Audio regression:
- Audio classification:
- Speech recognition:
- Graph node regression:
- Graph node classification:
- Multi-modal causal language modeling:
 
- Image regression: Backbone options: 
 
- This option selects several values for certain grid search hyperparameters. Up to 50 child experiments are generated when you run (start) the parent experiment.
- 3
- This option selects several values for certain grid search hyperparameters. When you run (start) the parent experiment, up to 100 child experiments are generated.
- Image regression: Backbone options: "tf_efficientnetv2_b3", "resnet50", "eca_nfnet_l0",; Learning Rate options:0.001, 0.0003, 0.0001; Epochs options:5, 10; Augmentation Strategy:"Soft", "Medium"; Mix Augmentations options:"Disabled", "Mixup". Image size is fixed at 384x384 pixels.
- 3D image regression: Backbone options: "resnet34d", "tf_efficientnet_b3_ns"; Learning Rate options:0.001, 0.0003, 0.0001; Epochs options:10, 20; Augmentation Strategy options:"Soft", "Medium"; Mix Augmentations options:"Disabled", "Mixup". Image size is fixed at 256x256x128 pixels.
- Image classification:
- 3D image classification:
- Image object detection:
- Image semantic segmentation:
- 3D image semantic segmentation:
- Image instance segmentation:
- Image metric learning:
- Text regression:
- Text classification:
- Text token classification:
- Text span prediction:
- Text sequence-to-sequence:
- Text metric learning:
- Image and text classification:
- Audio regression:
- Audio classification:
- Speech recognition:
- Graph node regression:
- Graph node classification:
- Multi-modal causal language modeling:
 
- Image regression: Backbone options: 
 
- This option selects several values for certain grid search hyperparameters. When you run (start) the parent experiment, up to 100 child experiments are generated.
This setting is only available if you select AutoDL as the experience level.
- Submit and view feedback for this page
- Send feedback about H2O Hydrogen Torch to cloud-feedback@h2o.ai