Your first workflow

This walkthrough is meant to be read alongside the YAML editor. You already have a hello-world workflow open on the left — each section below introduces one concept and gives you a complete workflow you can paste-replace into the editor to see it in action.

Copy-paste safe

Every code block in this walkthrough is a complete workflow. Select all in the editor, paste the snippet, and run. No indentation gymnastics.

The starter workflow

id: hello-world
name: Hello World

jobs:
  greet:
    name: Say hello
    steps:
      - name: Print greeting
        run: echo "Hello from H2O Workflows!"

Five keys carry the whole model.

id — a unique identifier for this workflow within the workspace. Other workflows reference it by this id.
name — a human-readable label. Optional; shown in the UI and logs.
jobs — a map of jobs, keyed by id. Each job is an independent unit of work and, by default, runs in parallel with sibling jobs.
steps — an ordered list of actions inside a job. Steps run sequentially; each one starts a fresh shell.
run — the shell command for this step. A step must specify exactly one of run, upload, or download.

That's the entire mental model. Everything else is configuration on top of these.

1. Change the greeting

Find the echo line in the editor and edit it. The output appears in the run logs panel.

id: hello-world
name: Hello World

jobs:
  greet:
    name: Say hello
    steps:
      - name: Print greeting
        run: echo "Hello, $(whoami)!"

2. Add a second step

Steps in a job execute top-to-bottom. State (working directory, env vars, processes) is not carried between steps — every step starts a fresh shell.

id: hello-world
name: Hello World

jobs:
  greet:
    name: Say hello
    steps:
      - name: Print greeting
        run: echo "Hello from H2O Workflows!"

      - name: Show the date
        run: date -u

If a step fails, the rest of the job stops by default. Set continue_on_error: true on a step to keep going. See Failure handling.

3. Add a second job

Jobs run in parallel unless you declare a dependency. depends_on makes the second job wait until the first succeeds — remove it and they run concurrently.

id: hello-world
name: Hello World

jobs:
  greet:
    name: Say hello
    steps:
      - name: Print greeting
        run: echo "Hello!"

  farewell:
    name: Say goodbye
    depends_on: [greet]
    steps:
      - name: Print farewell
        run: echo "Goodbye!"

See Jobs for runners, timeouts, and dependency graphs.

4. Take an input

Inputs are typed parameters supplied at trigger time. Reference them in YAML with ${{ .inputs.<name> }} (Go template syntax). Supported types are string, bool, and int.

id: hello-world
name: Hello World

inputs:
  who:
    type: string
    required: true
    description: "Who to greet"

jobs:
  greet:
    name: Say hello
    steps:
      - name: Print greeting
        run: echo "Hello, ${{ .inputs.who }}!"

See Inputs and Expressions.

5. Move files in and out with Drive

upload and download steps move files between the runner and H2O Drive. Drive paths use the form drive://<bucket>/<path>.

Use this to share artifacts between jobs (the runner filesystem does not persist across jobs) and to keep results after the run completes.

id: hello-world
name: Hello World

jobs:
  fetch:
    name: Fetch and store
    steps:
      - name: Fetch a public dataset
        run: curl -sSL -o data.csv https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris.csv

      - name: Upload to Drive
        upload:
          path: data.csv
          destination: drive://default/datasets/iris.csv

  process:
    name: Download and process
    depends_on: [fetch]
    steps:
      - name: Download from Drive
        download:
          source: drive://default/datasets/iris.csv
          path: ./data.csv

      - name: Count rows
        run: wc -l data.csv

The fetch job pulls a public dataset with plain curl and uploads it to Drive. The process job downloads it back and works on it. They're split into two jobs on purpose: the runner filesystem does not persist across jobs, so process can't see fetch's files — Drive is the handoff. default is your workspace's default Drive bucket, so this runs as-is; point it at another bucket / path you have access to as needed. See Storage.

6. Run it on a schedule

Add a trigger.schedule with one or more cron expressions. You can also pass inputs per schedule entry.

id: hello-world
name: Hello World

trigger:
  schedule:
    - cron: "0 6 * * *"   # every day at 06:00 UTC

jobs:
  greet:
    name: Say hello
    steps:
      - name: Print greeting
        run: echo "Hello from H2O Workflows!"

See Triggers.

7. Use a secret

Secrets come from H2O Secure Store and are masked in logs. Declare them at the workflow level and reference them in steps with ${{ .secrets.<name> }}.

id: hello-world
name: Hello World

secrets:
  api_token:
    resource: workspaces/default/secrets/api-token

jobs:
  call-api:
    name: Call an API
    steps:
      - name: Fetch data
        run: |
          curl -sS \
            -H "Authorization: Bearer ${{ .secrets.api_token }}" \
            https://api.example.com/data

See Secrets.

8. Talk to an H2O product

The runner injects a platform access token into every step as H2O_CLOUD_CLIENT_PLATFORM_TOKEN. You don't declare it — it's already there, scoped to the workspace running the workflow. Pass it to the login() helper of any H2O Python client.

id: hello-world
name: Hello World

jobs:
  list-engines:
    name: List DAI engines
    steps:
      - name: Install the AI Engine Manager client
        run: sudo uv pip install --system h2o-engine-manager

      - name: List my engines
        run: |
          python3 -c "
          import os, h2o_engine_manager
          clients = h2o_engine_manager.login(
              platform_token=os.environ['H2O_CLOUD_CLIENT_PLATFORM_TOKEN'],
          )
          print(clients.dai_engine_client.list_engines())
          "

Two things worth noticing.

sudo uv pip install --system is the install pattern on this runner. pip is not pre-installed; the system Python is owned by root; uv is the supported installer.
The install in step 1 carries into step 2. Steps share the runner's filesystem, so --system packages survive — even though every step starts a fresh shell. Anything written to $PATH, cd, or export does not carry over.

See Runner environment for the full list of pre-installed tools and per-product client install commands (MLOps, H2O-3, Feature Store, Driverless AI).

Where to go next

Once you're past hello-world, the topics below cover everything else.

Core syntax

Workflow structure — every top-level field.
Jobs — runners, depends_on, timeouts.
Steps — run, upload, download, working_dir, env.
Runner environment — auto-injected token, pre-installed tools, installing Python clients.
Expressions — the ${{ }} template syntax.

Parameterization

Inputs — typed parameters.
Environment variables — static config.
Secrets — sensitive values.

Scaling out

Matrix jobs — fan out a job over parameter combinations.
Reusable workflows — call one workflow from another.
Concurrency — prevent overlapping runs.

Operating workflows

Triggers — cron schedules.
Storage — H2O Drive paths and glob patterns.
Timeouts — bound runtime at step, job, and workflow levels.
Failure handling — continue_on_error and cancel_on_failure.

End-to-end examples

Simple pipeline — three-job pipeline with inputs and triggers.
Matrix data processing — fan-out processing.
Reusable workflow — workflow-call patterns.

Reference

Schema reference — complete CUE schema.
API reference — Python SDK and REST API.

Feedback

Submit and view feedback for this page
Send feedback about H2O Workflows to cloud-feedback@h2o.ai

The starter workflow​

1. Change the greeting​

2. Add a second step​

3. Add a second job​

4. Take an input​

5. Move files in and out with Drive​

6. Run it on a schedule​

7. Use a secret​

8. Talk to an H2O product​

Where to go next​