Skip to main content

Reusable workflow

This guide walks you through a reusable workflow that other workflows can call as a step. You will learn how to make a workflow callable, pass secret inputs, reference workspace secrets, and handle errors gracefully with continue_on_error.

Prerequisites

Step 1: Understand the workflow

Three jobs run in sequence: download-and-validate fetches and validates input data, transform applies transformations using an external API (with a secret-authenticated service call that tolerates failures), and finalize publishes the output. The workflow is callable — it cannot run on its own but is invoked by other workflows via workflow_call steps.

Step 2: Walk through the YAML

Callable trigger

Setting callable: true makes this workflow a reusable building block:

trigger:
callable: true

Other workflows invoke it using a workflow_call step and pass inputs at call time. This workflow has no cron schedule — it only runs when called. See Reusable Workflows and Triggers.

Secret inputs

The api_key input uses secret: true to tell H2O Workflows to mask this value in logs and the UI:

api_key:
type: string
required: true
secret: true
description: "API key for external service"

The secret: true flag does not change how the value is passed — it controls visibility. See Inputs.

Workspace secrets

The secrets block fetches a secret from H2O Secure Store and makes it available inside job steps:

secrets:
- name: workspaces/019a55f6-2c62-746f-a49b-4e42f470f26c/secrets/service-token
as: service_token

Once declared, the secret is accessible via ${{ .secrets.service_token }} in any job step. See Secrets.

Using secrets in steps

The transform job exposes both the secret input and the workspace secret as environment variables:

env:
API_KEY: ${{ .inputs.api_key }}
SERVICE_TOKEN: ${{ .secrets.service_token }}

Injecting secrets as environment variables keeps them out of the YAML source and out of command-line arguments. See Expressions.

Error handling with continue_on_error

The external service call is marked continue_on_error: true so that a failure does not stop the job:

- name: Call external service
run: python scripts/call_service.py --data data/output/ --token "$SERVICE_TOKEN"
continue_on_error: true

If this step fails, the job continues to the next step instead of failing immediately. Use this pattern for non-critical operations where partial failure is acceptable. See Failure Handling.

Step 3: Deploy with the Python SDK

import h2o_workflows

clients = h2o_workflows.login()

with open("examples/reusable-workflow.yaml") as f:
source = f.read()

from h2o_workflows.workflow.workflow import Workflow

workflow = clients.workflow.create_workflow(
parent="workspaces/my-workspace",
workflow=Workflow(source_contents=source),
)
print(f"Created: {workflow.name}")

# Activate so other workflows can call it
clients.workflow.activate_workflow(name=workflow.name)
print("Reusable workflow activated -- ready to be called")

Since this workflow uses trigger.callable: true, it will not run on its own. Other workflows invoke it using a workflow_call step. For the full client API, see the Python SDK Reference.

Complete YAML

id: reusable-workflow
name: Reusable Workflow

trigger:
callable: true

inputs:
source_bucket:
type: string
required: true
description: "Source Drive Workspace bucket"

source_path:
type: string
required: true
description: "Path within source bucket"

destination_bucket:
type: string
required: true
description: "Source Drive Workspace bucket"

api_key:
type: string
required: true
secret: true
description: "API key for external service"

secrets:
- name: workspaces/019a55f6-2c62-746f-a49b-4e42f470f26c/secrets/service-token
as: service_token

env:
SCRIPTS_REPO: "https://github.com/h2oai/project-scripts.git"

jobs:
download-and-validate:
name: Download and validate input
runner: cpu-medium
timeout: "20m"

steps:
- name: Download source data
download:
source: drive://${{ .inputs.source_bucket }}/${{ .inputs.source_path }}
path: ./data/input/

- name: Clone scripts
run: git clone --depth 1 $SCRIPTS_REPO scripts

- name: Install dependencies
run: pip install -r scripts/requirements.txt

- name: Validate input
run: python scripts/validate.py --input data/input/

- name: Upload validated data
upload:
path: data/input/
destination: drive://${{ .inputs.destination_bucket }}/validated/

transform:
name: Transform data
depends_on: [download-and-validate]
runner: cpu-large
timeout: "30m"

env:
API_KEY: ${{ .inputs.api_key }}
SERVICE_TOKEN: ${{ .secrets.service_token }}

steps:
- name: Download validated data
download:
source: drive://${{ .inputs.destination_bucket }}/validated/
path: ./data/input/

- name: Clone scripts
run: git clone --depth 1 $SCRIPTS_REPO scripts

- name: Install dependencies
run: pip install -r scripts/requirements.txt

- name: Run transformation
run: python scripts/transform.py --input data/input/ --output data/output/ --api-key "$API_KEY"

- name: Call external service
run: python scripts/call_service.py --data data/output/ --token "$SERVICE_TOKEN"
continue_on_error: true

- name: Upload transformed data
upload:
path: data/output/
destination: drive://${{ .inputs.destination_bucket }}/transformed/

finalize:
name: Finalize and publish
depends_on: [transform]
runner: cpu-small
timeout: "10m"

steps:
- name: Download transformed data
download:
source: drive://${{ .inputs.destination_bucket }}/transformed/
path: ./data/final/

- name: Clone scripts
run: git clone --depth 1 $SCRIPTS_REPO scripts

- name: Install dependencies
run: pip install -r scripts/requirements.txt

- name: Finalize output
run: python scripts/finalize.py --input data/final/ --output data/published/

- name: Upload final output
upload:
path: data/published/
destination: drive://${{ .inputs.destination_bucket }}/published/

Next steps


Feedback