Skip to main content
Version: v1.7.3-14 🚧

Collection lifecycle

Overview​

Enterprise h2oGPTe provides automated collection lifecycle management to help administrators control data retention and storage usage. Collections follow a three-state lifecycle managed by automated background processes, with administrator controls for expiration policies, inactivity thresholds, and recovery.

Collection states​

Collections transition through three states before permanent deletion:

FromToTrigger
activeexpiringThe expiry date passes or the inactivity interval elapses.
expiringarchivedThe expiration window (expiration_limit_days) has elapsed.
archivedDeletedAutomatic cleanup process removes the collection data.
expiring or archivedactiveAn administrator recovers the collection.
note

Automatic cleanup runs periodically as a background process. Once the cleanup process deletes a collection, you can't recover it.

Lifecycle settings​

The following settings control system-wide collection lifecycle behavior.

SettingDescription
expiration_limit_daysNumber of days before expiring collections are archived. Changing this setting triggers re-evaluation of all collection statuses.
default_collection_inactivity_daysDays of inactivity before a collection begins the expiration process. Set to -1 to turn off inactivity-based expiration.
default_collection_size_limitDefault maximum storage per collection (in bytes). Range: 1 MB to 10 GB.
enable_adhoc_collection_expirationTurn on automatic expiration for ad-hoc (agent-created) collections.
adhoc_collection_expiration_daysNumber of days before agent-created collections expire. Range: 1 to 30.

Configure lifecycle settings​

# Set collection expiration window to 90 days
curl -X PUT "https://<YOUR_DOMAIN>/api/v1/configurations/expiration_limit_days" \
-H "Authorization: Bearer <API_KEY>" \
-H "Content-Type: application/json" \
-d '{"string_value": "90"}'

# Enable inactivity-based cleanup at 90 days
curl -X PUT "https://<YOUR_DOMAIN>/api/v1/configurations/default_collection_inactivity_days" \
-H "Authorization: Bearer <API_KEY>" \
-H "Content-Type: application/json" \
-d '{"string_value": "90"}'

# Set collection size limit to 500 MB
curl -X PUT "https://<YOUR_DOMAIN>/api/v1/configurations/default_collection_size_limit" \
-H "Authorization: Bearer <API_KEY>" \
-H "Content-Type: application/json" \
-d '{"string_value": "500000000"}'

# Enable automatic cleanup for agent-created collections
curl -X PUT "https://<YOUR_DOMAIN>/api/v1/configurations/enable_adhoc_collection_expiration" \
-H "Authorization: Bearer <API_KEY>" \
-H "Content-Type: application/json" \
-d '{"string_value": "true"}'

Per-collection controls​

Individual collections can have their own lifecycle settings that override system defaults.

Set collection expiry date​

Set an explicit expiry date on a collection, overriding the system default:

curl -X PUT "https://<YOUR_DOMAIN>/api/v1/collections/{collection_id}/expiry_date" \
-H "Authorization: Bearer <API_KEY>" \
-H "Content-Type: application/json" \
-d '{"expiry_date": "2026-12-31"}'

Set collection inactivity interval​

Override the system-wide inactivity threshold for a specific collection:

curl -X PUT "https://<YOUR_DOMAIN>/api/v1/collections/{collection_id}/inactivity_interval" \
-H "Authorization: Bearer <API_KEY>" \
-H "Content-Type: application/json" \
-d '{"inactivity_interval": 30}'

Set collection size limit​

Set a maximum storage limit for a specific collection (in bytes):

curl -X PUT "https://<YOUR_DOMAIN>/api/v1/collections/{collection_id}/size_limit" \
-H "Authorization: Bearer <API_KEY>" \
-H "Content-Type: application/json" \
-d '{"size_limit": "500000000"}'

Remove collection size limit​

Remove the size limit from a collection, reverting to the system default:

curl -X DELETE "https://<YOUR_DOMAIN>/api/v1/collections/{collection_id}/size_limit" \
-H "Authorization: Bearer <API_KEY>"

Recover a collection​

Administrators can recover collections from the expiring or archived state, restoring them to active. Use this when the system expired a collection unintentionally or when you still need the data.

important

You can only recover collections before the automatic cleanup process permanently deletes them. Once the process deletes a collection, you can't recover it.

Recover with the REST API​

Restore a collection to active status by providing its collection ID:

curl -X POST "https://<YOUR_DOMAIN>/api/v1/collections/{collection_id}/unarchive" \
-H "Authorization: Bearer <API_KEY>"

Recover with the Python SDK​

Use the Python SDK to recover a collection programmatically:

from h2ogpte import H2OGPTE

admin = H2OGPTE(address="https://<YOUR_DOMAIN>", api_key="<API_KEY>")

# Recover an expiring or archived collection
admin.unarchive_collection(collection_id="<COLLECTION_ID>")

Archive a collection​

Administrators can manually archive a collection, bypassing the automatic expiration process.

Archive with the REST API​

curl -X POST "https://<YOUR_DOMAIN>/api/v1/collections/{collection_id}/archive" \
-H "Authorization: Bearer <API_KEY>"

Archive with the Python SDK​

from h2ogpte import H2OGPTE

admin = H2OGPTE(address="https://<YOUR_DOMAIN>", api_key="<API_KEY>")

admin.archive_collection(collection_id="<COLLECTION_ID>")

Manage collections as an administrator​

Administrators can list and manage all collections in the system regardless of ownership.

List all collections​

Retrieve all collections with pagination:

curl -s "https://<YOUR_DOMAIN>/api/v1/collections/all?offset=0&limit=100" \
-H "Authorization: Bearer <API_KEY>"

Delete a collection​

Permanently remove a collection and its associated data:

curl -X DELETE "https://<YOUR_DOMAIN>/api/v1/collections/{collection_id}" \
-H "Authorization: Bearer <API_KEY>"

Configure lifecycle with the Python SDK​

The following example configures system-wide lifecycle settings using the Python SDK:

from h2ogpte import H2OGPTE

admin = H2OGPTE(address="https://<YOUR_DOMAIN>", api_key="<API_KEY>")

# Set collection expiration window
admin.set_global_configuration(
"expiration_limit_days", "90", can_overwrite=False, is_public=True
)

# Enable inactivity-based cleanup at 90 days
admin.set_global_configuration(
"default_collection_inactivity_days", "90", can_overwrite=False, is_public=True
)

# Set collection size limit (500 MB)
admin.set_global_configuration(
"default_collection_size_limit", "500000000", can_overwrite=False, is_public=True
)

# Enable automatic cleanup for agent-created collections
admin.set_global_configuration(
"enable_adhoc_collection_expiration", "true", can_overwrite=False, is_public=True
)

Audit and telemetry events​

The system tracks collection operations through two mechanisms: the audit trail for security-relevant operations and telemetry for lifecycle monitoring.

Audit trail events​

The following operations produce formal audit trail records that administrators can query:

EventTrigger
create_collectionA user creates a new collection.
update_collectionA user updates collection metadata.
update_collection_settingsA user updates collection settings.
make_collection_publicA user makes a collection accessible to all users.
make_collection_privateA user removes public access from a collection.
share_collectionA user shares a collection with another user.
share_collection_with_groupA user shares a collection with a group.
unshare_collectionA user removes another user's access to a collection.
unshare_collection_from_groupA user removes a group's access to a collection.
unshare_collection_for_allA user removes all shared access from a collection.

Telemetry events​

The following lifecycle operations emit telemetry events for monitoring but do not produce audit trail records:

EventTrigger
CollectionArchivedA collection transitions to the archived state.
CollectionRecoveredAn administrator recovers a collection to active status.
ArchivedCollectionDeletedThe cleanup process permanently deletes an archived collection.

Feedback