Quick start

This page shows you the minimal YAML to define a workflow and how to deploy it.

Your first workflow

The following workflow clones a repository, downloads a dataset from H2O Drive, runs a processing script, and uploads the results:

id: data-pipeline
name: Data Pipeline

jobs:
  process:
    name: Process Data
    steps:
      - name: Clone repository
        run: git clone https://github.com/org/data-processing.git

      - name: Download dataset
        download:
          source: drive://your-bucket-uuid/datasets/raw-data.csv
          path: ./data.csv

      - name: Process data
        run: python data-processing/process.py --input data.csv --output results.json

      - name: Upload results
        upload:
          path: results.json
          destination: drive://your-bucket-uuid/processed/results.json

Replace your-bucket-uuid with the UUID of your H2O Drive bucket.

Deploy with the Python SDK

Install the SDK and deploy your workflow:

pip install h2o-workflows

import h2o_workflows
from h2o_workflows.workflow.workflow import Workflow

clients = h2o_workflows.login()

with open("data-pipeline.yaml") as f:
    source = f.read()

workflow = clients.workflow.create_workflow(
    parent="workspaces/my-workspace",
    workflow=Workflow(source_contents=source),
)
print(f"Created: {workflow.name}")

The API validates your YAML automatically when you create or update a workflow. Validation errors are returned in the response with details about what failed.

Next steps

Key Terms — learn the core vocabulary.
Simple pipeline guide — a three-job pipeline with inputs, triggers, and dependencies.
Workflow Syntax — full reference for all workflow fields.
API Reference — Python SDK and REST API documentation.

Feedback

Submit and view feedback for this page
Send feedback about H2O Workflows to cloud-feedback@h2o.ai

Your first workflow​

Deploy with the Python SDK​

Next steps​

Your first workflow

Deploy with the Python SDK

Next steps