Timeouts

Timeouts specify the maximum execution time for jobs and steps. If execution exceeds the timeout, the system terminates the job or step.

Duration format

Timeouts use Go's standard duration string format. A duration string is a sequence of decimal numbers, each with optional fraction and unit suffix.

Supported units

Unit	Description
`ns`	Nanoseconds
`us` or `µs`	Microseconds
`ms`	Milliseconds
`s`	Seconds
`m`	Minutes
`h`	Hours

Format rules

Durations can combine multiple units: "1h30m", "2h45m30s"
Units must be specified from larger to smaller: "1h30m" (correct), "30m1h" (incorrect)
Decimal values are allowed: "1.5h", "2.5m"
No spaces between value and unit: "5m" (correct), "5 m" (incorrect)
Values must be positive (zero and negative durations are invalid and cause validation errors)

Valid examples

timeout: "30s"      # 30 seconds
timeout: "5m"       # 5 minutes
timeout: "1h"       # 1 hour
timeout: "1h30m"    # 1 hour 30 minutes
timeout: "1.5h"     # 1.5 hours (90 minutes)

Invalid examples

timeout: "30"        # Missing unit
timeout: "1 h"       # Space between value and unit
timeout: "30m1h"     # Wrong order (minutes before hours)

Scope levels

Configure timeouts at two levels.

Job-level timeout

Maximum execution time for the entire job, including all steps.

Location: timeout field in job definition

Behavior: If any step in the job causes the total time to exceed the job timeout, the system terminates the job.

Example:

jobs:
  train:
    name: Train Model
    timeout: "2h"
    steps:
      - name: Load dataset
        run: python load.py

      - name: Preprocess data
        run: python preprocess.py

      - name: Train model
        run: python train.py

Step-level timeout

Maximum execution time for an individual step.

Location: timeout field in step definition

Behavior: If the step exceeds its timeout, only that step is terminated.

Example:

steps:
  - name: Evaluate model
    timeout: "10m"
    run: python evaluate.py

  - name: Deploy model
    timeout: "5m"
    run: python deploy.py

Timeout precedence and behavior

Job and step timeouts together

When both job and step timeouts are specified:

Step timeout applies to individual step execution
Job timeout applies to total job execution time
Whichever limit is reached first terminates execution

Example:

jobs:
  train:
    name: Train Models
    timeout: "3h"   # Job must complete in 3 hours
    steps:
      - name: Train baseline model
        timeout: "1h"   # This step must complete in 1 hour
        run: python train_baseline.py

      - name: Train advanced model
        timeout: "2h"   # This step must complete in 2 hours
        run: python train_advanced.py

In this example:

Each step has a maximum of 1 hour and 2 hours respectively
The entire job (both steps combined) must complete within 3 hours
If step 1 takes 55 minutes and step 2 takes 2 hours 10 minutes, the job times out (total 3 hours 5 minutes exceeds job timeout)

Default behavior

When timeout is not specified:

System default timeout is used
Default values are determined by the workflow execution engine
It's recommended to specify timeouts for long-running operations

Best practices

Set realistic timeouts

Base timeouts on expected execution times with buffer for variability:

jobs:
  train:
    timeout: "4h"  # Expected: 2-3h, buffer for slow runs
    steps:
      - name: Train model
        timeout: "3h"
        run: python train.py

Use step timeouts for isolation

Prevent a single slow step from consuming the entire job timeout:

jobs:
  pipeline:
    timeout: "2h"
    steps:
      - name: Download data
        timeout: "10m"
        run: python download.py

      - name: Process data
        timeout: "30m"
        run: python process.py

      - name: Train model
        timeout: "1h"
        run: python train.py

Complete example

id: ml-training
name: ML Training Pipeline

jobs:
  preprocess:
    timeout: "30m"
    steps:
      - name: Download dataset
        timeout: "5m"
        download:
          source: drive://datasets/raw/
          path: ./data/

      - name: Clean data
        timeout: "20m"
        run: python clean.py

  train:
    depends_on: [preprocess]
    timeout: "6h"
    steps:
      - name: Train baseline
        timeout: "2h"
        run: python train.py --config baseline

      - name: Train advanced
        timeout: "4h"
        run: python train.py --config advanced

  evaluate:
    depends_on: [train]
    timeout: "1h"
    steps:
      - name: Run evaluation
        timeout: "45m"
        run: python evaluate.py

Feedback

Submit and view feedback for this page
Send feedback about H2O Orchestrator | Docs to cloud-feedback@h2o.ai

Duration format​

Supported units​

Format rules​

Valid examples​

Invalid examples​

Scope levels​

Job-level timeout​

Step-level timeout​

Timeout precedence and behavior​

Job and step timeouts together​

Default behavior​

Best practices​

Set realistic timeouts​

Use step timeouts for isolation​

Complete example​

Duration format

Supported units

Format rules

Valid examples

Invalid examples

Scope levels

Job-level timeout

Step-level timeout

Timeout precedence and behavior

Job and step timeouts together

Default behavior

Best practices

Set realistic timeouts

Use step timeouts for isolation

Complete example