Skip to main content

Steps

Steps are individual actions that run within a job. They execute sequentially in the order you define them.

Note

Each step must specify exactly one action: run, upload, or download.

Fields

name (optional)

Display name of the step. Use this field to describe what action the step performs.

Type: string

Example:

steps:
- name: Load training data
run: python load_data.py

working_dir (optional)

Working directory for this step. This setting overrides the job-level working_dir if set.

Type: string

Behavior:

  • Sets the current working directory for this step
  • Affects where shell commands in run execute
  • Affects how relative paths in upload.path and download.path resolve
  • Absolute paths are not affected by working_dir

Inheritance:

  • If the job has working_dir, all steps inherit it
  • A step can override with its own working_dir
  • Step override takes precedence

Example:

steps:
- name: Train baseline model
working_dir: ./experiments/baseline
run: python train.py

- name: Train advanced model
working_dir: ./experiments/advanced
run: python train.py

With upload and download:

steps:
- name: Upload model from experiment
working_dir: ./experiments
upload:
path: models/ # Relative to ./experiments
destination: drive://bucket/models/

Default: Inherits from the job's working_dir, or uses the home directory if not set.

run (optional)

Shell commands to execute in this step. Each step's run invocation executes in a new shell session.

Type: string

Format: Supports single-line or multi-line strings using YAML's | or > syntax.

Shell session behavior:

  • Each step with run starts a new shell session
  • Environment variables from env are available in the shell
  • The shell exits after command execution completes
  • State is not preserved between steps

Examples:

Single-line command:

steps:
- name: Install dependencies
run: pip install -r requirements.txt

Multi-line script:

steps:
- name: Train and validate
run: |
python preprocess.py
python train.py
echo "Training complete!"

With environment variables:

steps:
- name: Train model
env:
MODEL_TYPE: xgboost
run: |
echo "Training ${MODEL_TYPE} model"
python train.py --model ${MODEL_TYPE}

upload (optional)

Upload files or folders to H2O Drive. Supports single files, folders, and glob patterns.

Type: Upload object with path and destination fields

Example:

steps:
- name: Upload trained model
upload:
path: models/
destination: drive://my-bucket/models/trained/

For path behavior, glob patterns, and Drive URL format, see Storage.

download (optional)

Download files or folders from H2O Drive. Supports single files and folders.

Type: Download object with source and path fields

Example:

steps:
- name: Download dataset
download:
source: drive://my-bucket/datasets/train.csv
path: ./data/train.csv

For download behavior and path handling, see Storage.

timeout (optional)

Maximum execution time for this step. The system terminates the step if execution exceeds this duration.

Type: Duration string (for example, "10m", "2h")

For duration format, scope levels, and precedence rules, see Timeouts.

continue_on_error (optional)

Continue job execution even if this step fails.

Type: bool

Default: false (step failure causes job failure)

For detailed behavior and interaction with fail-fast, see Failure handling.

env (optional)

Environment variables for this step.

Type: Map of string to string

For scope, inheritance, and precedence rules, see Environment variables.

Step execution

  • Steps within a job run sequentially in the order you define them
  • Each step runs after the previous step completes
  • If a step fails and continue_on_error is not true, subsequent steps in that job do not run

Complete example

jobs:
process:
name: Process and Upload Data
working_dir: ./project
timeout: "2h"
env:
LOG_LEVEL: info
steps:
- name: Download raw data
download:
source: drive://data-bucket/raw/dataset.csv
path: ./data/raw.csv

- name: Preprocess data
timeout: "30m"
run: |
python preprocess.py --input ./data/raw.csv --output ./data/processed.csv
echo "Preprocessing complete"

- name: Train model
timeout: "1h"
env:
MODEL_TYPE: xgboost
run: python train.py --data ./data/processed.csv --model $MODEL_TYPE

- name: Run optional validation
continue_on_error: true
run: python validate.py --strict

- name: Upload results
upload:
path: ./output/
destination: drive://results-bucket/output/

Feedback