Skip to main content

Storage

Upload and download steps enable transferring files and folders to and from H2O Drive during workflow execution.

Overview

Steps can upload and download files using H2O Drive:

  • Upload to H2O Drive: Transfer local files and folders to H2O Drive
  • Download from H2O Drive: Transfer files and folders from H2O Drive to the local filesystem

These are explicit step types, distinct from shell command execution. Currently, only H2O Drive is supported as a storage backend.

Upload

Upload local files or folders to H2O Drive storage.

Fields

path (required)

Local filesystem path or glob pattern to upload.

Supports:

  • Single files: model.pkl
  • Folders: models/ (uploads recursively)
  • Glob patterns: *.log, models/**/*.pkl

Examples:

path: model.pkl              # Single trained model
path: models/ # Entire models folder
path: "*.json" # All metric files in current directory
path: "checkpoints/**" # All checkpoints recursively

destination (required)

Destination in H2O Drive.

Format: drive://bucket-name/path/to/destination

Examples:

destination: drive://bucket/models/trained_model.pkl
destination: drive://bucket/experiments/exp-123/models/

Path behavior

Single file:

upload:
path: training.log
destination: drive://bucket/logs/training.log
# Local: training.log → Drive: drive://bucket/logs/training.log

Folder (recursive):

upload:
path: models/
destination: drive://bucket/trained-models/models/
# Local: models/model.pkl → Drive: drive://bucket/trained-models/models/model.pkl
# Local: models/checkpoints/epoch_10.h5 → Drive: drive://bucket/trained-models/models/checkpoints/epoch_10.h5
# Structure under models/ is preserved

Standard Drive behavior:

  • The base path (models/) is stripped
  • Structure under it is preserved at the destination
  • Trailing / on destination indicates a directory or prefix

Glob pattern:

upload:
path: "*.log"
destination: drive://bucket/logs/
# Local: training.log → Drive: drive://bucket/logs/training.log
# Local: evaluation.log → Drive: drive://bucket/logs/evaluation.log
upload:
path: "models/**/*.pkl"
destination: drive://bucket/model-artifacts/
# Local: models/classifier.pkl → Drive: drive://bucket/model-artifacts/classifier.pkl
# Local: models/ensemble/voting.pkl → Drive: drive://bucket/model-artifacts/ensemble/voting.pkl
# Base (models/) is stripped, structure under it preserved

Download

Download files or folders from H2O Drive storage to the local filesystem.

Fields

source (required)

Drive URL source in H2O Drive.

Format: drive://bucket-name/path/to/source

Examples:

source: drive://bucket/models/trained_model.pkl
source: drive://bucket/datasets/processed-data/

path (required)

Local filesystem destination path.

Examples:

path: ./model.pkl           # Download to specific file
path: ./data/ # Download to folder

Path behavior

Single file:

download:
source: drive://bucket/models/trained_model.pkl
path: ./model.pkl
# Drive: drive://bucket/models/trained_model.pkl → Local: ./model.pkl

Folder (recursive):

download:
source: drive://bucket/training-artifacts/models/
path: ./models/
# Drive: drive://bucket/training-artifacts/models/model.pkl → Local: ./models/model.pkl
# Drive: drive://bucket/training-artifacts/models/features/engineered.parquet → Local: ./models/features/engineered.parquet
# Structure is preserved

Download to different path:

download:
source: drive://bucket/datasets/training-data/
path: ./data/
# Drive: drive://bucket/datasets/training-data/features.csv → Local: ./data/features.csv

Drive URL format

H2O Drive URLs follow the standard format:

drive://bucket-name/path/to/object

Components:

ComponentDescription
drive://Protocol prefix (required)
bucket-nameName of the H2O Drive bucket
/path/to/objectObject key or path within the bucket

Common patterns

Share artifacts between jobs

jobs:
train:
steps:
- name: Upload model
upload:
path: models/
destination: drive://ml-artifacts/models/

evaluate:
depends_on: [train]
steps:
- name: Download model
download:
source: drive://ml-artifacts/models/
path: ./models/

Download dataset and upload results

steps:
- name: Download training data
download:
source: drive://datasets/features.parquet
path: ./data/features.parquet

- name: Train and upload metrics
run: python train.py

- name: Upload results
upload:
path: "*.json"
destination: drive://results/

Glob patterns

Supported glob patterns:

PatternDescriptionExample
*Match any characters (except /)*.json matches metrics.json, config.json
**Match any characters (including /)models/**/*.pkl matches all .pkl files recursively
?Match single charactermodel?.pkl matches model1.pkl, modelA.pkl
[abc]Match character setcheckpoint[123].pkl matches checkpoint1.pkl, checkpoint2.pkl

Feedback