Skip to main content

Concepts

This page explains the core concepts behind H2O Orchestrator workflows.

Workflows

A workflow is a YAML-defined automation pipeline that orchestrates the execution of jobs. Workflows allow you to:

  • Define multi-step data processing and ML pipelines
  • Run jobs in parallel or sequentially based on dependencies
  • Schedule recurring executions with cron expressions
  • Pass parameters and secrets securely

Jobs

A job is an independent unit of work within a workflow. Jobs can:

  • Run in parallel with other jobs (default behavior)
  • Run sequentially by declaring dependencies with depends_on
  • Execute on specific compute environments using runners
  • Contain multiple steps or call other workflows

Steps

A step is a single action within a job. Steps execute sequentially in the order defined. Each step performs one of three actions:

ActionDescription
runExecute shell commands
uploadTransfer files to H2O Drive
downloadTransfer files from H2O Drive

Expressions

Expressions enable dynamic values in workflow files using the ${{ }} syntax. You can reference:

  • Inputs: ${{ .inputs.model_type }}
  • Secrets: ${{ .secrets.api_key }}
  • Environment variables: ${{ .env.DATA_PATH }}
  • Matrix variables: ${{ .matrix.algorithm }}

Matrix jobs

Matrix jobs run a single job definition across multiple parameter combinations. The system creates parallel job instances for each combination using the Cartesian product of matrix variables.

matrix:
algorithm: [xgboost, lightgbm]
dataset: [train, validation]
# Creates 4 parallel jobs (2 × 2)

Concurrency control

Concurrency control prevents multiple workflow instances from running simultaneously. Use concurrency groups to:

  • Protect shared resources
  • Prevent conflicting deployments
  • Queue or cancel overlapping executions

Reusable workflows

Reusable workflows allow you to call one workflow from another, enabling modular design. Mark a workflow as callable with trigger.callable: true, then reference it from other workflows using the workflow field in a job.

Storage

H2O Orchestrator integrates with H2O Drive for file storage. Use upload and download steps to:

  • Share artifacts between jobs
  • Persist results for later analysis
  • Load datasets and models from cloud storage

Feedback