Quick start
This page shows you the minimal YAML to define a workflow and how to deploy it.
Your first workflow
The following workflow clones a repository, downloads a dataset from H2O Drive, runs a processing script, and uploads the results:
id: data-pipeline
name: Data Pipeline
jobs:
process:
name: Process Data
steps:
- name: Clone repository
run: git clone https://github.com/org/data-processing.git
- name: Download dataset
download:
source: drive://your-bucket-uuid/datasets/raw-data.csv
path: ./data.csv
- name: Process data
run: python data-processing/process.py --input data.csv --output results.json
- name: Upload results
upload:
path: results.json
destination: drive://your-bucket-uuid/processed/results.json
Replace your-bucket-uuid with the UUID of your H2O Drive bucket.
Deploy with the Python SDK
Install the SDK and deploy your workflow:
pip install h2o-workflows
import h2o_workflows
from h2o_workflows.workflow.workflow import Workflow
clients = h2o_workflows.login()
with open("data-pipeline.yaml") as f:
source = f.read()
workflow = clients.workflow.create_workflow(
parent="workspaces/my-workspace",
workflow=Workflow(source_contents=source),
)
print(f"Created: {workflow.name}")
The API validates your YAML automatically when you create or update a workflow. Validation errors are returned in the response with details about what failed.
Next steps
- Key Terms — learn the core vocabulary.
- Simple pipeline guide — a three-job pipeline with inputs, triggers, and dependencies.
- Workflow Syntax — full reference for all workflow fields.
- API Reference — Python SDK and REST API documentation.
Feedback
- Submit and view feedback for this page
- Send feedback about H2O Workflows to cloud-feedback@h2o.ai