Reusable workflow
This guide walks you through a reusable workflow that other workflows can call as a step. You will learn how to make a workflow callable, pass secret inputs, reference workspace secrets, and handle errors gracefully with continue_on_error.
Prerequisites
- H2O AI Cloud access (see Access H2O Workflows).
- Python SDK installed:
pip install h2o-workflows. - A workspace secret named
service-tokenconfigured in H2O Secure Store. - An API key for the external service.
Step 1: Understand the workflow
Three jobs run in sequence: download-and-validate fetches and validates input data, transform applies transformations using an external API (with a secret-authenticated service call that tolerates failures), and finalize publishes the output. The workflow is callable — it cannot run on its own but is invoked by other workflows via workflow_call steps.
Step 2: Walk through the YAML
Callable trigger
Setting callable: true makes this workflow a reusable building block:
trigger:
callable: true
Other workflows invoke it using a workflow_call step and pass inputs at call time. This workflow has no cron schedule — it only runs when called. See Reusable Workflows and Triggers.
Secret inputs
The api_key input uses secret: true to tell H2O Workflows to mask this value in logs and the UI:
api_key:
type: string
required: true
secret: true
description: "API key for external service"
The secret: true flag does not change how the value is passed — it controls visibility. See Inputs.
Workspace secrets
The secrets block fetches a secret from H2O Secure Store and makes it available inside job steps:
secrets:
- name: workspaces/019a55f6-2c62-746f-a49b-4e42f470f26c/secrets/service-token
as: service_token
Once declared, the secret is accessible via ${{ .secrets.service_token }} in any job step. See Secrets.
Using secrets in steps
The transform job exposes both the secret input and the workspace secret as environment variables:
env:
API_KEY: ${{ .inputs.api_key }}
SERVICE_TOKEN: ${{ .secrets.service_token }}
Injecting secrets as environment variables keeps them out of the YAML source and out of command-line arguments. See Expressions.
Error handling with continue_on_error
The external service call is marked continue_on_error: true so that a failure does not stop the job:
- name: Call external service
run: python scripts/call_service.py --data data/output/ --token "$SERVICE_TOKEN"
continue_on_error: true
If this step fails, the job continues to the next step instead of failing immediately. Use this pattern for non-critical operations where partial failure is acceptable. See Failure Handling.
Step 3: Deploy with the Python SDK
import h2o_workflows
clients = h2o_workflows.login()
with open("examples/reusable-workflow.yaml") as f:
source = f.read()
from h2o_workflows.workflow.workflow import Workflow
workflow = clients.workflow.create_workflow(
parent="workspaces/my-workspace",
workflow=Workflow(source_contents=source),
)
print(f"Created: {workflow.name}")
# Activate so other workflows can call it
clients.workflow.activate_workflow(name=workflow.name)
print("Reusable workflow activated -- ready to be called")
Since this workflow uses trigger.callable: true, it will not run on its own. Other workflows invoke it using a workflow_call step. For the full client API, see the Python SDK Reference.
Complete YAML
id: reusable-workflow
name: Reusable Workflow
trigger:
callable: true
inputs:
source_bucket:
type: string
required: true
description: "Source Drive Workspace bucket"
source_path:
type: string
required: true
description: "Path within source bucket"
destination_bucket:
type: string
required: true
description: "Source Drive Workspace bucket"
api_key:
type: string
required: true
secret: true
description: "API key for external service"
secrets:
- name: workspaces/019a55f6-2c62-746f-a49b-4e42f470f26c/secrets/service-token
as: service_token
env:
SCRIPTS_REPO: "https://github.com/h2oai/project-scripts.git"
jobs:
download-and-validate:
name: Download and validate input
runner: cpu-medium
timeout: "20m"
steps:
- name: Download source data
download:
source: drive://${{ .inputs.source_bucket }}/${{ .inputs.source_path }}
path: ./data/input/
- name: Clone scripts
run: git clone --depth 1 $SCRIPTS_REPO scripts
- name: Install dependencies
run: pip install -r scripts/requirements.txt
- name: Validate input
run: python scripts/validate.py --input data/input/
- name: Upload validated data
upload:
path: data/input/
destination: drive://${{ .inputs.destination_bucket }}/validated/
transform:
name: Transform data
depends_on: [download-and-validate]
runner: cpu-large
timeout: "30m"
env:
API_KEY: ${{ .inputs.api_key }}
SERVICE_TOKEN: ${{ .secrets.service_token }}
steps:
- name: Download validated data
download:
source: drive://${{ .inputs.destination_bucket }}/validated/
path: ./data/input/
- name: Clone scripts
run: git clone --depth 1 $SCRIPTS_REPO scripts
- name: Install dependencies
run: pip install -r scripts/requirements.txt
- name: Run transformation
run: python scripts/transform.py --input data/input/ --output data/output/ --api-key "$API_KEY"
- name: Call external service
run: python scripts/call_service.py --data data/output/ --token "$SERVICE_TOKEN"
continue_on_error: true
- name: Upload transformed data
upload:
path: data/output/
destination: drive://${{ .inputs.destination_bucket }}/transformed/
finalize:
name: Finalize and publish
depends_on: [transform]
runner: cpu-small
timeout: "10m"
steps:
- name: Download transformed data
download:
source: drive://${{ .inputs.destination_bucket }}/transformed/
path: ./data/final/
- name: Clone scripts
run: git clone --depth 1 $SCRIPTS_REPO scripts
- name: Install dependencies
run: pip install -r scripts/requirements.txt
- name: Finalize output
run: python scripts/finalize.py --input data/final/ --output data/published/
- name: Upload final output
upload:
path: data/published/
destination: drive://${{ .inputs.destination_bucket }}/published/
Next steps
- Simple Pipeline — Build a scheduled data pipeline.
- Workflow Syntax Reference — Full YAML syntax documentation.
- Submit and view feedback for this page
- Send feedback about H2O Workflows to cloud-feedback@h2o.ai