Skip to main content

Quick start

This page shows you the minimal YAML to define a workflow and how to deploy it.

Your first workflow

The following workflow clones a repository, downloads a dataset from H2O Drive, runs a processing script, and uploads the results:

id: data-pipeline
name: Data Pipeline

jobs:
process:
name: Process Data
steps:
- name: Clone repository
run: git clone https://github.com/org/data-processing.git

- name: Download dataset
download:
source: drive://your-bucket-uuid/datasets/raw-data.csv
path: ./data.csv

- name: Process data
run: python data-processing/process.py --input data.csv --output results.json

- name: Upload results
upload:
path: results.json
destination: drive://your-bucket-uuid/processed/results.json

Replace your-bucket-uuid with the UUID of your H2O Drive bucket.

Deploy with the Python SDK

Install the SDK and deploy your workflow:

pip install h2o-workflows
import h2o_workflows
from h2o_workflows.workflow.workflow import Workflow

clients = h2o_workflows.login()

with open("data-pipeline.yaml") as f:
source = f.read()

workflow = clients.workflow.create_workflow(
parent="workspaces/my-workspace",
workflow=Workflow(source_contents=source),
)
print(f"Created: {workflow.name}")

The API validates your YAML automatically when you create or update a workflow. Validation errors are returned in the response with details about what failed.

Next steps


Feedback