Concepts
This page explains the core concepts behind H2O Orchestrator workflows.
Workflows
A workflow is a YAML-defined automation pipeline that orchestrates the execution of jobs. Workflows allow you to:
- Define multi-step data processing and ML pipelines
- Run jobs in parallel or sequentially based on dependencies
- Schedule recurring executions with cron expressions
- Pass parameters and secrets securely
Jobs
A job is an independent unit of work within a workflow. Jobs can:
- Run in parallel with other jobs (default behavior)
- Run sequentially by declaring dependencies with
depends_on - Execute on specific compute environments using runners
- Contain multiple steps or call other workflows
Steps
A step is a single action within a job. Steps execute sequentially in the order defined. Each step performs one of three actions:
| Action | Description |
|---|---|
run | Execute shell commands |
upload | Transfer files to H2O Drive |
download | Transfer files from H2O Drive |
Expressions
Expressions enable dynamic values in workflow files using the ${{ }} syntax. You can reference:
- Inputs:
${{ .inputs.model_type }} - Secrets:
${{ .secrets.api_key }} - Environment variables:
${{ .env.DATA_PATH }} - Matrix variables:
${{ .matrix.algorithm }}
Matrix jobs
Matrix jobs run a single job definition across multiple parameter combinations. The system creates parallel job instances for each combination using the Cartesian product of matrix variables.
matrix:
algorithm: [xgboost, lightgbm]
dataset: [train, validation]
# Creates 4 parallel jobs (2 × 2)
Concurrency control
Concurrency control prevents multiple workflow instances from running simultaneously. Use concurrency groups to:
- Protect shared resources
- Prevent conflicting deployments
- Queue or cancel overlapping executions
Reusable workflows
Reusable workflows allow you to call one workflow from another, enabling modular design. Mark a workflow as callable with trigger.callable: true, then reference it from other workflows using the workflow field in a job.
Storage
H2O Orchestrator integrates with H2O Drive for file storage. Use upload and download steps to:
- Share artifacts between jobs
- Persist results for later analysis
- Load datasets and models from cloud storage
- Submit and view feedback for this page
- Send feedback about H2O Orchestrator | Docs to cloud-feedback@h2o.ai