Skip to main content

Storage

Upload and download steps enable transferring files and folders to/from H2O Drive during workflow execution.

Overview

Steps can upload and download files using H2O Drive:

  • Upload to H2O Drive: Transfer local files/folders to H2O Drive.
  • Download from H2O Drive: Transfer files/folders from H2O Drive to local filesystem.

These are explicit step types, distinct from shell command execution. Currently, only H2O Drive is supported as a storage backend.

Schema

See Schema Reference for the complete #UploadH2ODrive and #DownloadH2ODrive definitions.

Upload

Upload local files or folders to H2O Drive storage.

Fields

path (required)

Local filesystem path or glob pattern to upload.

Supports:

  • Single files: model.pkl.
  • Folders: models/ (uploads recursively).
  • Glob patterns: *.log, models/**/*.pkl.

Examples:

path: model.pkl              # Single trained model
path: models/ # Entire models folder
path: "*.json" # All metric files in current dir
path: "checkpoints/**" # All checkpoints recursively

destination (required)

Destination in H2O Drive.

Format: drive://bucket-name/path/to/destination

Examples:

destination: drive://bucket/models/trained_model.pkl
destination: drive://bucket/experiments/exp-123/models/

Path Behavior

Single File:

upload:
path: training.log
destination: drive://bucket/logs/training.log
# Local: training.log → Drive: drive://bucket/logs/training.log

Folder (Recursive):

upload:
path: models/
destination: drive://bucket/trained-models/models/
# Local: models/model.pkl → Drive: drive://bucket/trained-models/models/model.pkl
# Local: models/checkpoints/epoch_10.h5 → Drive: drive://bucket/trained-models/models/checkpoints/epoch_10.h5
# Structure under models/ is preserved

Standard Drive Behavior:

  • The base path (models/) is stripped.
  • Structure under it is preserved at destination.
  • Trailing / on destination indicates a directory/prefix.

Glob Pattern:

upload:
path: "*.log"
destination: drive://bucket/logs/
# Local: training.log → Drive: drive://bucket/logs/training.log
# Local: evaluation.log → Drive: drive://bucket/logs/evaluation.log
upload:
path: "models/**/*.pkl"
destination: drive://bucket/model-artifacts/
# Local: models/classifier.pkl → Drive: drive://bucket/model-artifacts/classifier.pkl
# Local: models/ensemble/voting.pkl → Drive: drive://bucket/model-artifacts/ensemble/voting.pkl
# Base (models/) is stripped, structure under it preserved

Download

Download files or folders from H2O Drive storage to local filesystem.

Fields

source (required)

Drive URL source in H2O Drive.

Format: drive://bucket-name/path/to/source

Examples:

source: drive://bucket/models/trained_model.pkl
source: drive://bucket/datasets/processed-data/

path (required)

Local filesystem destination path.

Examples:

path: ./model.pkl           # Download to specific file
path: ./data/ # Download to folder

Path Behavior

Single File:

download:
source: drive://bucket/models/trained_model.pkl
path: ./model.pkl
# Drive: drive://bucket/models/trained_model.pkl → Local: ./model.pkl

Folder (Recursive):

download:
source: drive://bucket/training-artifacts/models/
path: ./models/
# Drive: drive://bucket/training-artifacts/models/model.pkl → Local: ./models/model.pkl
# Drive: drive://bucket/training-artifacts/models/features/engineered.parquet → Local: ./models/features/engineered.parquet
# Structure is preserved

Download to Different Path:

download:
source: drive://bucket/datasets/training-data/
path: ./data/
# Drive: drive://bucket/datasets/training-data/features.csv → Local: ./data/features.csv

Drive URL Format

H2O Drive URLs follow the standard format:

drive://bucket-name/path/to/object

Components:

  • drive:// — Protocol prefix (required).
  • bucket-name — Name of the H2O Drive bucket.
  • /path/to/object — Object key/path within the bucket.

Common Patterns

Share Artifacts Between Jobs

jobs:
train:
steps:
- name: Upload model
upload:
path: models/
destination: drive://ml-artifacts/models/

evaluate:
depends_on: [train]
steps:
- name: Download model
download:
source: drive://ml-artifacts/models/
path: ./models/

Download Dataset and Upload Results

steps:
- name: Download training data
download:
source: drive://datasets/features.parquet
path: ./data/features.parquet
- name: Train and upload metrics
run: python train.py
- name: Upload results
upload:
path: "*.json"
destination: drive://results/

Glob Patterns

Supported glob patterns:

PatternDescriptionExample
*Match any characters (except /)*.json matches metrics.json, config.json
**Match any characters (including /)models/**/*.pkl matches all .pkl files recursively
?Match single charactermodel?.pkl matches model1.pkl, modelA.pkl
[abc]Match character setcheckpoint[123].pkl matches checkpoint1.pkl, checkpoint2.pkl

Feedback