Skip to main content

Workflow syntax overview

H2O Orchestrator workflows define automated pipelines using YAML syntax. You can create workflows that orchestrate data processing, model training, deployment, and other ML operations.

Key concepts

A workflow consists of three main components:

  • Workflow: The top-level container that defines the pipeline name, inputs, and jobs.
  • Jobs: Independent units of work that run in parallel or sequentially based on dependencies.
  • Steps: Sequential actions within a job, such as shell commands or file transfers.

Quick start

The following example shows a basic data pipeline workflow:

id: data-pipeline
name: Data Pipeline

jobs:
process:
name: Process Data
steps:
- name: Clone repository
run: git clone https://github.com/org/data-processing.git

- name: Download dataset
download:
source: drive://bucket/datasets/raw-data.csv
path: ./data.csv

- name: Process data
run: python data-processing/process.py --input data.csv --output results.json

- name: Upload results
upload:
path: results.json
destination: drive://bucket/processed/results.json

Documentation guide

Core concepts

TopicDescription
Workflow structureTop-level workflow configuration and fields
JobsJob configuration, runners, and dependencies
StepsStep types: shell commands, uploads, and downloads

Features

TopicDescription
InputsDefine typed parameters for workflows
TriggersSchedule workflows with cron expressions
ExpressionsUse dynamic values with ${{ }} syntax
ConcurrencyPrevent simultaneous workflow executions
Reusable workflowsCall workflows from other workflows
Matrix jobsRun jobs across multiple configurations
StorageUpload and download files with H2O Drive
Failure handlingControl behavior when jobs or steps fail
TimeoutsSet maximum execution time for jobs and steps
Environment variablesConfigure key-value pairs for runtime
SecretsAccess sensitive data from H2O Secure Store

Reference

TopicDescription
Schema referenceComplete field definitions and types

Feedback