Skip to main content

Timeouts

Timeouts specify the maximum execution time for jobs and steps. If execution exceeds the timeout, the system terminates the job or step.

Duration format

Timeouts use Go's standard duration string format. A duration string is a sequence of decimal numbers, each with optional fraction and unit suffix.

Supported units

UnitDescription
nsNanoseconds
us or µsMicroseconds
msMilliseconds
sSeconds
mMinutes
hHours

Format rules

  • Durations can combine multiple units: "1h30m", "2h45m30s"
  • Units must be specified from larger to smaller: "1h30m" (correct), "30m1h" (incorrect)
  • Decimal values are allowed: "1.5h", "2.5m"
  • No spaces between value and unit: "5m" (correct), "5 m" (incorrect)
  • Values must be positive (zero and negative durations are invalid and cause validation errors)

Valid examples

timeout: "30s"      # 30 seconds
timeout: "5m" # 5 minutes
timeout: "1h" # 1 hour
timeout: "1h30m" # 1 hour 30 minutes
timeout: "1.5h" # 1.5 hours (90 minutes)

Invalid examples

timeout: "30"        # Missing unit
timeout: "1 h" # Space between value and unit
timeout: "30m1h" # Wrong order (minutes before hours)

Scope levels

Configure timeouts at two levels.

Job-level timeout

Maximum execution time for the entire job, including all steps.

Location: timeout field in job definition

Behavior: If any step in the job causes the total time to exceed the job timeout, the system terminates the job.

Example:

jobs:
train:
name: Train Model
timeout: "2h"
steps:
- name: Load dataset
run: python load.py

- name: Preprocess data
run: python preprocess.py

- name: Train model
run: python train.py

Step-level timeout

Maximum execution time for an individual step.

Location: timeout field in step definition

Behavior: If the step exceeds its timeout, only that step is terminated.

Example:

steps:
- name: Evaluate model
timeout: "10m"
run: python evaluate.py

- name: Deploy model
timeout: "5m"
run: python deploy.py

Timeout precedence and behavior

Job and step timeouts together

When both job and step timeouts are specified:

  • Step timeout applies to individual step execution
  • Job timeout applies to total job execution time
  • Whichever limit is reached first terminates execution

Example:

jobs:
train:
name: Train Models
timeout: "3h" # Job must complete in 3 hours
steps:
- name: Train baseline model
timeout: "1h" # This step must complete in 1 hour
run: python train_baseline.py

- name: Train advanced model
timeout: "2h" # This step must complete in 2 hours
run: python train_advanced.py

In this example:

  • Each step has a maximum of 1 hour and 2 hours respectively
  • The entire job (both steps combined) must complete within 3 hours
  • If step 1 takes 55 minutes and step 2 takes 2 hours 10 minutes, the job times out (total 3 hours 5 minutes exceeds job timeout)

Default behavior

When timeout is not specified:

  • System default timeout is used
  • Default values are determined by the workflow execution engine
  • It's recommended to specify timeouts for long-running operations

Best practices

Set realistic timeouts

Base timeouts on expected execution times with buffer for variability:

jobs:
train:
timeout: "4h" # Expected: 2-3h, buffer for slow runs
steps:
- name: Train model
timeout: "3h"
run: python train.py

Use step timeouts for isolation

Prevent a single slow step from consuming the entire job timeout:

jobs:
pipeline:
timeout: "2h"
steps:
- name: Download data
timeout: "10m"
run: python download.py

- name: Process data
timeout: "30m"
run: python process.py

- name: Train model
timeout: "1h"
run: python train.py

Complete example

id: ml-training
name: ML Training Pipeline

jobs:
preprocess:
timeout: "30m"
steps:
- name: Download dataset
timeout: "5m"
download:
source: drive://datasets/raw/
path: ./data/

- name: Clean data
timeout: "20m"
run: python clean.py

train:
depends_on: [preprocess]
timeout: "6h"
steps:
- name: Train baseline
timeout: "2h"
run: python train.py --config baseline

- name: Train advanced
timeout: "4h"
run: python train.py --config advanced

evaluate:
depends_on: [train]
timeout: "1h"
steps:
- name: Run evaluation
timeout: "45m"
run: python evaluate.py

Feedback