Concepts
This page introduces the core concepts behind H2O Workflows features. Each concept links to its detailed syntax reference page. Understanding these concepts will help you author, structure, and run workflows on H2O AI Cloud.
Workflows
A workflow is a YAML-defined automation pipeline that orchestrates the execution of jobs. It contains metadata (id, name), configuration (environment variables, concurrency rules), and one or more jobs. Workflows enable multi-step pipelines, parallel and sequential execution, scheduled runs via cron, and secure parameter handling.
See Workflow Syntax: Workflow for the full reference.
Jobs
A job is an independent unit of work within a workflow. Jobs run in parallel by default, or sequentially when you declare dependencies with depends_on. Each job runs on a designated compute environment (runner) and contains one or more steps.
See Workflow Syntax: Jobs for the full reference.
Steps
A step is a single action within a job. Steps execute sequentially in the order you define them. Each step performs one of three operations: executing a shell command (run), uploading files to H2O Drive (upload), or downloading files from H2O Drive (download).
See Workflow Syntax: Steps for the full reference.
Expressions
Expressions let you inject dynamic values into your workflow using Go template syntax (${{ }}). You can reference workflow inputs, secrets, environment variables, and matrix variables. Use expressions to parameterize your workflows and pass data between steps and jobs.
See Workflow Syntax: Expressions for the full reference.
Matrix jobs
A matrix strategy runs the same job across multiple parameter combinations, creating one parallel job instance per combination using the Cartesian product of the defined variables. Use matrix jobs when you need to process data across multiple environments, parameters, or datasets.
See Workflow Syntax: Matrix Jobs for the full reference.
Concurrency
Concurrency control manages simultaneous workflow instances to protect shared resources and prevent conflicting executions. You assign a concurrency group identifier and optionally enable cancel-in-progress to abort older runs when a new one starts.
See Workflow Syntax: Concurrency for the full reference.
Reusable workflows
Reusable workflows let you call one workflow from another by marking it callable with trigger.callable: true and referencing it from a job using the workflow field. Use reusable workflows to share common logic across multiple pipelines without duplication.
See Workflow Syntax: Reusable Workflows for the full reference.
Storage
Storage operations let you upload files to and download files from H2O Drive within your workflow steps. Use storage to share artifacts between jobs, persist results beyond a job's lifetime, and load datasets or models needed for processing.
See Workflow Syntax: Storage for the full reference.
- Submit and view feedback for this page
- Send feedback about H2O Workflows to cloud-feedback@h2o.ai