Matrix jobs
Matrix jobs enable parallel execution of a job with different parameter combinations. Define variables with multiple values, and the system creates a separate job instance for each combination using the Cartesian product.
Defining matrices
Basic matrix
A matrix with a single variable creates multiple parallel job instances:
jobs:
train-model:
name: Train with Different Algorithms
matrix:
algorithm: [xgboost, lightgbm, random_forest, neural_net]
runner: gpu-large
timeout: "2h"
steps:
- name: Train model
env:
ALGORITHM: ${{ .matrix.algorithm }}
run: |
python train.py --algorithm $ALGORITHM --data ./data/train.csv
This creates 4 job instances, one for each algorithm.
Multi-variable matrix
Multiple variables create a Cartesian product of all combinations:
jobs:
hyperparameter-search:
name: Hyperparameter Tuning
matrix:
learning_rate: ["0.001", "0.01", "0.1"]
batch_size: ["32", "64", "128"]
runner: gpu-large
timeout: "3h"
steps:
- name: Train with hyperparameters
env:
LR: ${{ .matrix.learning_rate }}
BATCH: ${{ .matrix.batch_size }}
run: |
echo "Training with learning_rate=$LR and batch_size=$BATCH"
python train.py --lr $LR --batch-size $BATCH --epochs 100
- name: Upload model
upload:
path: models/model.pkl
destination: drive://models/lr-${{ .matrix.learning_rate }}-batch-${{ .matrix.batch_size }}/
This creates 9 job instances (3 learning rates × 3 batch sizes):
hyperparameter-search[batch_size:32,learning_rate:0.001]hyperparameter-search[batch_size:32,learning_rate:0.01]hyperparameter-search[batch_size:32,learning_rate:0.1]hyperparameter-search[batch_size:64,learning_rate:0.001]- And so on...
Matrix with workflow calls
Matrices work with reusable workflows, passing matrix variables as inputs:
jobs:
evaluate-models:
name: Evaluate on Multiple Datasets
matrix:
model: [xgboost, lightgbm, random_forest]
dataset: [train, validation, test]
workflow:
name: model-evaluation
inputs:
model_type: ${{ .matrix.model }}
dataset_name: ${{ .matrix.dataset }}
metrics: "accuracy,f1,auc"
This creates 9 job instances (3 models × 3 datasets), each calling the model-evaluation workflow with different parameters.
Matrix expressions
Accessing matrix variables
Access matrix variables using the ${{ .matrix.variable_name }} expression syntax, consistent with other workflow expressions.
Format: ${{ .matrix.<variable_name> }}
Example:
jobs:
process:
matrix:
region: [us-east, eu-west, ap-south]
data_type: [transactions, events]
steps:
- name: Download data
download:
source: drive://data/${{ .matrix.region }}/${{ .matrix.data_type }}/
path: ./data/
- name: Process data
env:
REGION: ${{ .matrix.region }}
TYPE: ${{ .matrix.data_type }}
run: python process.py --region $REGION --type $TYPE
Job expansion and execution
Parallel execution
All matrix job instances run in parallel by default. There is no automatic sequencing or max-parallel limiter. All combinations execute concurrently as soon as the job's dependencies are satisfied.
Dependency handling
When a job depends on a matrix job, it waits for all matrix instances to complete successfully.
Example:
jobs:
train:
matrix:
model: [xgboost, lightgbm, random_forest]
steps:
- run: python train.py --model ${{ .matrix.model }}
evaluate:
depends_on: [train] # Waits for all 3 training instances
steps:
- run: python evaluate_all.py --models-dir ./models/
Complete example
id: hyperparameter-optimization
name: Train Models with Different Hyperparameters
inputs:
dataset_bucket:
type: string
required: true
description: Training dataset bucket UUID
env:
SCRIPTS_REPO: "https://github.com/h2oai/ml-training"
jobs:
train-models:
name: Train with Hyperparameters
matrix:
algorithm: [xgboost, lightgbm, random_forest]
max_depth: ["5", "10", "15"]
runner: gpu-large
timeout: "3h"
steps:
- name: Download training data
download:
source: drive://${{ .inputs.dataset_bucket }}/train.csv
path: ./data/train.csv
- name: Clone training scripts
run: git clone --depth 1 $SCRIPTS_REPO scripts
- name: Train model
env:
ALGORITHM: ${{ .matrix.algorithm }}
MAX_DEPTH: ${{ .matrix.max_depth }}
run: |
echo "Training $ALGORITHM with max_depth=$MAX_DEPTH"
python scripts/train.py \
--algorithm $ALGORITHM \
--max-depth $MAX_DEPTH \
--data ./data/train.csv \
--output ./models/
- name: Upload trained model
upload:
path: ./models/
destination: drive://models/${{ .matrix.algorithm }}-depth${{ .matrix.max_depth }}/
- name: Upload metrics
upload:
path: metrics.json
destination: drive://metrics/${{ .matrix.algorithm }}-depth${{ .matrix.max_depth }}.json
This creates 9 parallel training runs (3 algorithms × 3 max_depth values).
- Submit and view feedback for this page
- Send feedback about H2O Orchestrator | Docs to cloud-feedback@h2o.ai