h2oGPTe REST API: Guide
Overview
This guide shows how to call the h2oGPTe REST API directly using raw HTTP requests with
Python's requests library (no SDK required). You run a complete workflow: check server
health, create a collection, upload and ingest a document, run agent queries (streaming
and non-streaming), inspect results, and clean up.
Prerequisites
Before you begin, you need:
- Python 3.x installed
- The
requestslibrary (for example,python -m pip install requests) - An h2oGPTe global API key — see APIs to create one
All API calls require a global API key passed as a Bearer token.
- API base URL:
https://YOUR_H2OGPTE_URL/api/v1 - OpenAPI spec:
https://YOUR_H2OGPTE_URL/api-spec.yaml - Swagger UI:
https://YOUR_H2OGPTE_URL/swagger-ui/
API reference at a glance
Use this table to quickly find the detailed sections below.
| Category | Method | Endpoint | Notes |
|---|---|---|---|
| Health | GET | /rpc/health/readiness | Readiness probe |
| Health | GET | /rpc/health/liveness | Liveness probe |
| Models | GET | /api/v1/models | List LLMs |
| Models (OpenAI) | GET | /openai_api/v1/models | OpenAI-compatible list |
| Collections | POST | /api/v1/collections | Create collection |
| Collections | GET | /api/v1/collections | List collections |
| Collections | GET | /api/v1/collections/{id} | Get single collection |
| Collections | DELETE | /api/v1/collections/{id} | Delete collection |
| Uploads | PUT | /api/v1/uploads | Upload a file |
| Ingestion | POST | /api/v1/uploads/{id}/ingest | Ingest into collection |
| Chat | POST | /api/v1/chats | Create session |
| Chat | GET | /api/v1/chats | List sessions |
| Chat | GET | /api/v1/chats/{id} | Get session detail |
| Chat | DELETE | /api/v1/chats/{id} | Delete session |
| Chat | GET | /api/v1/chats/{id}/messages | Get messages |
| Chat | GET | /api/v1/chats/{id}/questions | Suggested follow-ups |
| Completions | POST | /api/v1/chats/{id}/completions | Query LLM / Agent |
| Completions (OpenAI) | POST | /openai_api/v1/chat/completions | OpenAI-compatible chat |
| Messages | GET | /api/v1/messages/{id}/meta | Message metadata |
| Messages | GET | /api/v1/messages/{id}/references | RAG citations |
| Agent Files | GET | /api/v1/chats/{id}/agent_server_files | List agent files |
| Agent Files | DELETE | /api/v1/chats/{id}/agent_server_files | Delete agent files |
| Agent Dirs | GET | /api/v1/chats/{id}/agent_server_directories/stats | Session dir stats |
| Agent Dirs | GET | /api/v1/chats/{id}/agent_server_directories/{name}/stats | Single dir stats |
| Agent Dirs | GET | /api/v1/agents/directory_stats | All sessions dir stats |
| Agent Tools | GET | /api/v1/agents/tools | List available tools |
| Agent Tools | GET | /api/v1/agents/tool_preference | Get tool preference |
| File Download | GET | /file?id={doc_id}&name={filename} | Download agent file |
Key agent llm_args parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
use_agent | bool | false | Enable AI agent with tool access |
agent_accuracy | string | "standard" | Effort: "quick" | "basic" | "standard" | "maximum" |
agent_max_turns | int | "auto" | "auto" | Max agent iterations |
agent_tools | string | list | "auto" | Tool selection: "auto", "all", or list of names |
agent_type | string | "auto" | Agent type: "auto", "general", "task", "deep_research", "coder", "search" |
agent_total_timeout | int | 3600 | Total timeout in seconds |
agent_stream_files | bool | true | Stream files as they are generated |
temperature | float | 0.0 | LLM sampling temperature |
max_new_tokens | int | 1536 | Max output tokens |
Setup and configuration
Every request needs the base URL and an Authorization header. Set these once and reuse them throughout.
import requests
import json
import os
import time
import io
from pathlib import Path
from datetime import datetime
from urllib.parse import quote
from pprint import pprint
# Configuration
BASE_URL = "https://YOUR_H2OGPTE_URL" # no trailing slash
API_KEY = "YOUR_API_KEY"
API_V1 = f"{BASE_URL}/api/v1"
HEADERS = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
}
# Helper functions
def _handle(resp: requests.Response, label: str = "") -> dict:
"""Raise on HTTP error and return parsed JSON."""
if not resp.ok:
print(f"[{resp.status_code}] {label}: {resp.text[:400]}")
resp.raise_for_status()
try:
return resp.json()
except Exception:
return {"raw": resp.text}
def api_get(path, params=None):
"""GET {API_V1}/{path}"""
return _handle(
requests.get(f"{API_V1}/{path}", headers=HEADERS, params=params),
label=f"GET {path}"
)
def api_post(path, body=None, files=None, params=None):
"""POST {API_V1}/{path}"""
if files:
# multipart — drop Content-Type so requests sets boundary automatically
hdrs = {k: v for k, v in HEADERS.items() if k != "Content-Type"}
return _handle(
requests.post(f"{API_V1}/{path}", headers=hdrs, files=files, params=params),
label=f"POST {path}"
)
return _handle(
requests.post(f"{API_V1}/{path}", headers=HEADERS, json=body, params=params),
label=f"POST {path}"
)
def api_put(path, files=None, body=None, params=None):
"""PUT {API_V1}/{path}"""
if files:
hdrs = {k: v for k, v in HEADERS.items() if k != "Content-Type"}
return _handle(
requests.put(f"{API_V1}/{path}", headers=hdrs, files=files, params=params),
label=f"PUT {path}"
)
return _handle(
requests.put(f"{API_V1}/{path}", headers=HEADERS, json=body, params=params),
label=f"PUT {path}"
)
def api_delete(path, body=None, params=None):
"""DELETE {API_V1}/{path}"""
return _handle(
requests.delete(f"{API_V1}/{path}", headers=HEADERS, json=body, params=params),
label=f"DELETE {path}"
)
def api_patch(path, body=None):
"""PATCH {API_V1}/{path}"""
return _handle(
requests.patch(f"{API_V1}/{path}", headers=HEADERS, json=body),
label=f"PATCH {path}"
)
print(f"Configured: {API_V1}")
Check server health
You can check server readiness using health probe endpoints outside of /api/v1.
resp = requests.get(f"{BASE_URL}/rpc/health/readiness", headers=HEADERS)
print(f"Health status: {resp.status_code} {resp.text.strip()[:120]}")
A 200 response means the server is ready to accept requests.
You can also check liveness:
resp = requests.get(f"{BASE_URL}/rpc/health/liveness", headers=HEADERS)
print(f"Liveness: {resp.status_code} {resp.text.strip()[:120]}")
List available LLMs
GET /api/v1/models returns all language models currently loaded on the server.
models_resp = api_get("models")
models = models_resp if isinstance(models_resp, list) else models_resp.get("data", [])
print(f"Available models ({len(models)}):")
for m in models[:10]: # show first 10
name = m.get("name") or m.get("id") or str(m)
print(f" • {name}")
# Pick a default model (or use 'auto' for automatic routing)
DEFAULT_LLM = "auto"
if models:
first_model_name = models[0].get("name") or models[0].get("id")
print(f"\nUsing DEFAULT_LLM = '{DEFAULT_LLM}' (set to a specific model name if preferred)")
You can pass a specific model name in subsequent requests or use "auto" for automatic routing.
Manage collections
Use a collection to group related documents for RAG (Retrieval-Augmented Generation) queries.
Create a collection
collection_payload = {
"name": f"demo-agent-csv-{int(time.time())}",
"description": "Demo collection for REST API agent file demo",
}
coll = api_post("collections", body=collection_payload)
collection_id = coll["id"]
print(f"Created collection: {coll['name']} (id={collection_id})")
List collections
cols = api_get("collections", params={"limit": 5, "sort_column": "updated_at", "ascending": False})
print(f"Your collections (most recent 5):")
for c in cols:
print(f" • [{c['id'][:8]}...] {c['name']} docs={c.get('document_count', 0)}")
Get a single collection
coll_detail = api_get(f"collections/{collection_id}")
print(f"Collection detail:")
pprint({k: v for k, v in coll_detail.items() if k in [
"id", "name", "description", "document_count", "document_size", "updated_at"
]})
Upload and ingest documents
To add a document to a collection, follow a two-step process:
- Upload the raw file bytes with
PUT /api/v1/uploadsto get an upload ID. - Ingest the upload into a collection with
POST /api/v1/uploads/{upload_id}/ingest.
Upload a file
sample_csv_content = """product,category,revenue,units_sold,month
Widget A,Electronics,12500,250,January
Widget B,Electronics,8300,166,January
Gadget X,Accessories,4200,420,January
Widget A,Electronics,14200,284,February
"""
csv_filename = "sales_data.csv"
csv_bytes = sample_csv_content.encode("utf-8")
print(f"Created sample CSV ({len(csv_bytes)} bytes): {csv_filename}")
print(sample_csv_content)
upload_resp = api_put(
"uploads",
files={"file": (csv_filename, csv_bytes, "text/csv")},
)
upload_id = upload_resp["id"]
print(f"Uploaded: id={upload_id} filename={upload_resp['filename']}")
Ingest into the collection
resp = requests.post(
f"{API_V1}/uploads/{upload_id}/ingest",
headers=HEADERS,
params={
"collection_id": collection_id,
"gen_doc_summaries": False,
"gen_doc_questions": False,
},
)
print(f"Ingest status: {resp.status_code}")
# 204 No Content = success
# Wait briefly for ingestion to complete
time.sleep(3)
# Verify document appeared in collection
coll_after = api_get(f"collections/{collection_id}")
print(f"Collection now has {coll_after.get('document_count', 0)} document(s)")
Create chat sessions
Use a chat session to provide context for a conversation with an LLM, optionally backed by a collection for RAG. For pure agent tasks (no RAG), create a session without a collection.
Create a session (agent-only, no collection)
session_resp = api_post("chats", body={})
session_id_agent = session_resp["id"]
print(f"Created agent-only session: {session_id_agent}")
Create a session with a collection (for RAG)
session_rag_resp = requests.post(
f"{API_V1}/chats",
headers=HEADERS,
json={},
params={"collection_id": collection_id},
)
session_rag_resp.raise_for_status()
session_id_rag = session_rag_resp.json()["id"]
print(f"Created RAG session: {session_id_rag} (collection={collection_id[:8]}...)")
List chat sessions
sessions = api_get("chats", params={"limit": 5})
print(f"Recent sessions ({len(sessions)}):")
for s in sessions:
print(f" • [{s['id']}] collection={str(s.get('collection_id','—'))[:12]} "
f"updated={s['updated_at'][:19]}")
Send an agent query (non-streaming)
POST /api/v1/chats/{session_id}/completions
Key agent parameters in llm_args:
| Parameter | Type | Description |
|---|---|---|
use_agent | bool | Enable the AI agent |
agent_accuracy | string | Effort level: "quick", "basic", "standard", "maximum" |
agent_max_turns | int or "auto" | Max agent iterations |
agent_tools | string or list | "auto", "all", or list of specific tool names |
agent_total_timeout | int | Wall-clock budget in seconds (default 3600) |
agent_stream_files | bool | Whether agent-generated files are streamed back |
agent_prompt = (
"Analyze the sales data and perform the following tasks:\n"
"- Calculate total revenue and total units sold per product\n"
"- Calculate total revenue per category\n"
"- Find the best-selling product by revenue\n"
"- Save the summary as a CSV file named 'sales_summary.csv'\n"
"- Save a month-over-month revenue report as 'monthly_revenue.csv'\n"
"Provide a brief written summary of the findings."
)
completion_payload = {
"message": agent_prompt,
"llm": DEFAULT_LLM,
"stream": False,
"llm_args": {
"use_agent": True,
"agent_accuracy": "standard", # quick | basic | standard | maximum
"agent_max_turns": "auto", # or an integer
"agent_tools": "auto", # or a list of specific tool names
"agent_total_timeout": 300, # seconds
"agent_stream_files": True, # stream files as they are generated
"temperature": 0.0,
"max_new_tokens": 4096,
},
"rag_config": {
"rag_type": "llm_only" # no RAG for a pure agent task
},
"include_chat_history": "off",
}
print("Sending agent request (may take 30–120 s) …")
t0 = time.time()
resp = requests.post(
f"{API_V1}/chats/{session_id_agent}/completions",
headers=HEADERS,
json=completion_payload,
timeout=360,
)
resp.raise_for_status()
completion = resp.json()
message_id = completion["message_id"]
body = completion["body"]
print(f"\nCompleted in {time.time()-t0:.1f}s")
print(f"Message ID : {message_id}")
print(f"\n--- Agent Response ---\n")
print(body[:2000])
Send an agent query (streaming)
To receive a streaming JSONL response, set "stream": true. Each line is a JSON object with body (incremental text) and finished (bool). The final message (finished: true) contains the message_id.
streaming_payload = {
"message": "Write a short Python script that generates fibonacci numbers up to 100 and saves them to 'fibonacci.csv'.",
"llm": DEFAULT_LLM,
"stream": True,
"llm_args": {
"use_agent": True,
"agent_accuracy": "quick",
"agent_total_timeout": 120,
"temperature": 0.0,
},
"rag_config": {"rag_type": "llm_only"},
"include_chat_history": "off",
}
stream_session = api_post("chats", body={})
stream_session_id = stream_session["id"]
print(f"Streaming session: {stream_session_id}")
print("\nStreaming response tokens as they arrive …\n")
full_response = ""
stream_msg_id = None
t0 = time.time()
with requests.post(
f"{API_V1}/chats/{stream_session_id}/completions",
headers=HEADERS,
json=streaming_payload,
stream=True,
timeout=180,
) as stream_resp:
stream_resp.raise_for_status()
for raw_line in stream_resp.iter_lines():
if not raw_line:
continue
line = raw_line if isinstance(raw_line, str) else raw_line.decode("utf-8")
try:
delta = json.loads(line)
except json.JSONDecodeError:
continue
if "error" in delta:
print(f"\n[Stream error] {delta['error']}")
break
chunk = delta.get("body", "")
full_response += chunk
print(chunk, end="", flush=True)
if delta.get("finished"):
stream_msg_id = delta.get("message_id")
break
print(f"\n\n--- Streaming complete in {time.time()-t0:.1f}s ---")
print(f"\n\nStream message_id: {stream_msg_id}")
Control chat message generation
While a chat query is actively generating a streaming response, you can control the generation state using the following endpoints:
- Pause Generation:
POST /api/v1/messages/{question_id}/pauseHalts the message streaming temporarily. The stream can be resumed later. - Resume Generation:
POST /api/v1/messages/{question_id}/resumeResumes a previously paused message stream. - Stop Generation:
POST /api/v1/messages/{question_id}/stopPermanently cancels the message generation. - Finish Generation:
POST /api/v1/messages/{question_id}/finishSignals the LLM to complete its current thought and finish naturally, providing a more coherent ending than an immediate stop.
A successful request to any of these endpoints returns a 204 No Content response.
Example:
# 1. Get the question_id of the currently generating message
# Fetch the recent history and isolate the most recent user prompt
messages = api_get(f"chats/{session_id}/messages", params={"offset": 0, "limit": 20})
# Filter for top-level questions (where reply_to is missing/null)
questions = [m for m in messages if not m.get("reply_to")]
if questions:
# Grab the most recent question's ID (assuming chronological order)
question_id = questions[-1]["id"]
else:
question_id = "fallback-id"
# ------------------------------------------------------------------
# 2. Use the retrieved question_id to control the active stream:
# ------------------------------------------------------------------
# Pause a streaming response
api_post(f"messages/{question_id}/pause")
# Resume it
api_post(f"messages/{question_id}/resume")
# Immediately stop the generation permanently
api_post(f"messages/{question_id}/stop")
# Signal the LLM to naturally complete its thought and finish
api_post(f"messages/{question_id}/finish")
Inspect chat message history
Use GET /api/v1/chats/{session_id}/messages to retrieve all messages in a session. Messages without reply_to are user messages; those with reply_to are LLM responses.
messages = api_get(f"chats/{session_id_agent}/messages", params={"offset": 0, "limit": 20})
print(f"Messages in session (total shown: {len(messages)}):")
for msg in messages:
role = "USER" if not msg.get("reply_to") else "ASSISTANT"
content_preview = msg.get("content", "")[:120].replace("\n", " ")
has_refs = msg.get("has_references", False)
print(f" [{role}] id={msg['id'][:8]}... refs={has_refs}")
print(f" {content_preview!r}")
print()
Retrieve message metadata
Use GET /api/v1/messages/{message_id}/meta?info_type=<type> to retrieve metadata attached to a specific message.
Common info_type values:
info_type | Content |
|---|---|
usage_stats | JSON: token counts and cost |
prompt_raw | Text: final prompt sent to LLM |
Agent session metadata info_type values:
info_type | Content |
|---|---|
agent_files | JSON: [{doc_id: filename}, ...] — new files generated by the agent |
agent_files_old | JSON: same format — files from earlier turns |
agent_chat_history | JSON: full agent reasoning trace |
agent_chat_history_md | Markdown: human-readable agent trace |
agent_analysis | Text: agent self-analysis |
Some deployments accept additional agent_* metadata types that are not listed in the OpenAPI enum.
Get agent-generated file list
resp = requests.get(
f"{API_V1}/messages/{message_id}/meta",
headers=HEADERS,
params={"info_type": "agent_files"},
)
resp.raise_for_status()
agent_files_meta = resp.json()
# Show raw metadata (truncated)
print(f"agent_files meta (raw): {json.dumps(agent_files_meta, indent=2)[:500]}")
# Parse the content — it's a JSON string of [{doc_id: filename}, ...]
agent_file_map = {}
if agent_files_meta:
try:
file_list = json.loads(agent_files_meta[0]["content"])
for entry in file_list:
agent_file_map.update(entry)
except (KeyError, json.JSONDecodeError) as e:
print(f"Could not parse agent_files content: {e}")
print(f"\nAgent-generated files ({len(agent_file_map)}):")
for doc_id, fname in agent_file_map.items():
print(f" • {fname} (doc_id={doc_id[:16]}...)")
Get usage stats
resp = requests.get(
f"{API_V1}/messages/{message_id}/meta",
headers=HEADERS,
params={"info_type": "usage_stats"},
)
resp.raise_for_status()
if resp.json():
usage = json.loads(resp.json()[0]["content"])
pprint(usage)
Get agent reasoning trace
resp = requests.get(
f"{API_V1}/messages/{message_id}/meta",
headers=HEADERS,
params={"info_type": "agent_chat_history_md"},
)
resp.raise_for_status()
if resp.json():
print(resp.json()[0]["content"][:1500])
List agent server files
Use GET /api/v1/chats/{session_id}/agent_server_files to list all files the agent wrote to its working directory during this session.
Each AgentServerFile record includes: id, filename, bytes, created_at (Unix timestamp), purpose, and object.
server_files = api_get(f"chats/{session_id_agent}/agent_server_files")
print(f"Agent server files ({len(server_files)}):")
print("-" * 70)
for f in server_files:
created = datetime.fromtimestamp(f.get("created_at", 0)).strftime("%Y-%m-%d %H:%M:%S")
size_kb = f.get("bytes", 0) / 1024
print(f" filename : {f.get('filename')}")
print(f" id : {f.get('id')}")
print(f" size : {size_kb:.2f} KB ({f.get('bytes')} bytes)")
print(f" created : {created}")
print(f" purpose : {f.get('purpose')}")
print("-" * 70)
View agent directory statistics
Use the following three endpoints to inspect the agent's working directory:
GET /api/v1/chats/{session_id}/agent_server_directories/stats— per-session directory statsGET /api/v1/chats/{session_id}/agent_server_directories/{dir_name}/stats— single directory detailGET /api/v1/agents/directory_stats— all sessions across the entire account
Per-session directory stats
dir_stats = api_get(
f"chats/{session_id_agent}/agent_server_directories/stats",
params={"detail_level": 1},
)
print(f"Agent directory stats for session {session_id_agent[:8]}...:")
print(f"Number of directories: {len(dir_stats)}")
print()
for d in dir_stats:
print(f" Directory ID : {d.get('id')}")
print(f" Size : {d.get('size_human_readable', d.get('size_bytes', '?'))}")
print(f" Files : {d.get('file_count', 0)}")
print(f" Directories : {d.get('directory_count', 0)}")
print(f" Created : {d.get('created_date', '?')}")
print(f" Modified : {d.get('modified_date', '?')}")
print(f" Is empty : {d.get('is_empty', '?')}")
print(f" Top contents : {d.get('top_level_contents', [])}")
if d.get("files"):
print(f" Files list:")
for file_info in d["files"]:
print(f" • {file_info.get('name')} "
f"({file_info.get('size_human_readable', '?')}) "
f"modified={file_info.get('modified_date', '?')}")
print()
Use detail_level=1 to also get per-file metadata within each directory.
Stats for a specific directory
if dir_stats:
first_dir_id = dir_stats[0].get("id")
if first_dir_id:
single_dir = api_get(
f"chats/{session_id_agent}/agent_server_directories/{quote(first_dir_id, safe='')}/stats",
params={"detail_level": 1},
)
print(f"Stats for directory '{first_dir_id}':")
pprint({k: v for k, v in single_dir.items() if k != "files"})
if single_dir.get("files"):
print(f"\nFiles in directory ({len(single_dir['files'])}):")
for fi in single_dir["files"]:
print(f" • {fi.get('name')} size={fi.get('size_human_readable', '?')} "
f"is_dir={fi.get('is_directory', False)}")
else:
print("No directories found — run the agent query cells first.")
Global agent directory stats
global_stats = api_get("agents/directory_stats", params={"offset": 0, "limit": 10})
print(f"Global agent directory stats ({len(global_stats)} sessions):")
for session_entry in global_stats[:3]: # show first 3 sessions
sess_id = session_entry.get("agent_chat_session_id")
preview = session_entry.get("chat_preview", "")[:80]
dirs = session_entry.get("stats", [])
total_files = sum(d.get("file_count", 0) for d in dirs)
total_size = sum(d.get("size_bytes", 0) for d in dirs)
print(f" Session: {sess_id[:8]}...")
print(f" Preview: {preview!r}")
print(f" Total files: {total_files} Total size: {total_size/1024:.1f} KB")
print()
Download agent-generated files
You can download agent files from h2oGPTe's object storage using:
GET /file?id={doc_id}&name={filename}
Authorization: Bearer YOUR_API_KEY
The doc_id comes from message metadata (info_type=agent_files).
The /file endpoint is outside of /api/v1.
DOWNLOAD_DIR = Path("./agent_downloads")
DOWNLOAD_DIR.mkdir(exist_ok=True)
download_headers = {"Authorization": f"Bearer {API_KEY}"}
if not agent_file_map:
print("No agent files found in message metadata. Re-run sections 10–11 first.")
else:
print(f"Downloading {len(agent_file_map)} agent file(s) to {DOWNLOAD_DIR}/")
print()
for doc_id, filename in agent_file_map.items():
# Build download URL — note: /file is NOT under /api/v1
dl_url = f"{BASE_URL}/file?id={quote(doc_id, safe='')}&name={quote(filename, safe='')}"
dl_resp = requests.get(dl_url, headers=download_headers, timeout=60)
if dl_resp.ok:
safe_name = Path(filename).name # strip any path component for safety
out_path = DOWNLOAD_DIR / safe_name
with open(out_path, "wb") as fp:
fp.write(dl_resp.content)
size_kb = len(dl_resp.content) / 1024
print(f" ✓ Downloaded: {safe_name} ({size_kb:.2f} KB) → {out_path}")
else:
print(f" ✗ Failed to download {filename}: HTTP {dl_resp.status_code}")
Discover agent tools
List all available agent tools
tools = api_get("agents/tools")
print(f"Available agent tools ({len(tools)}):")
for tool in tools[:20]: # show first 20
name = tool.get("name") or tool.get("id")
desc = str(tool.get("description", ""))[:80]
print(f" • {name:<40} {desc}")
Get user's tool preference
try:
tool_pref = api_get("agents/tool_preference")
print(f"Tool preferences ({len(tool_pref)}):")
for t in tool_pref:
print(f" • {t}")
except Exception as e:
print(f"Could not retrieve tool preference: {e}")
Use additional chat endpoints
Get suggested follow-up questions
try:
questions = api_get(
f"chats/{session_id_agent}/questions",
params={"limit": 5},
)
print(f"Suggested follow-up questions ({len(questions)}):")
for q in questions:
print(f" • {q.get('question')}")
except Exception as e:
print(f"Suggested questions not available: {e}")
Get session details
sess_detail = api_get(f"chats/{session_id_agent}")
print("Agent session details:")
pprint({k: v for k, v in sess_detail.items() if k in [
"id", "name", "collection_id", "latest_message_content", "updated_at"
]})
Get message references (RAG citations)
# Only populated for RAG sessions (collection-backed)
try:
references = api_get(f"messages/{message_id}/references")
print(f"References for message {message_id[:8]}...: ({len(references)})")
for ref in references[:3]:
print(f" • doc={ref.get('document_name')} score={ref.get('score'):.3f} pages={ref.get('pages')}")
except Exception as e:
print(f"References not available (expected for llm_only sessions): {e}")
Delete agent files
Use DELETE /api/v1/chats/{session_id}/agent_server_files to remove all files the agent wrote to its working directory for this session.
files_before = api_get(f"chats/{session_id_agent}/agent_server_files")
print(f"Files before deletion: {len(files_before)}")
for f in files_before:
print(f" • {f.get('filename')} ({f.get('bytes')} bytes)")
del_resp = requests.delete(
f"{API_V1}/chats/{session_id_agent}/agent_server_files",
headers=HEADERS,
)
print(f"Delete status: {del_resp.status_code}")
# 200 = success, 204 = success (no content), 409 = conflict (deletion in progress)
# Verify deletion
files_after = api_get(f"chats/{session_id_agent}/agent_server_files")
print(f"Files after deletion: {len(files_after)}")
OpenAI-compatible API
h2oGPTe also exposes OpenAI-compatible endpoints at /openai_api/v1/. You can use any
OpenAI-compatible client library (including the standard OpenAI Python client) with these
endpoints. For endpoint descriptions, request/response examples, and feature support
tables, see OpenAI-compatible REST API.
Clean up resources
Delete all the resources you created during this session.
sessions_to_delete = [
session_id_agent,
session_id_rag,
stream_session_id,
]
# Delete chat sessions
for sid in sessions_to_delete:
try:
r = requests.delete(
f"{API_V1}/chats/{sid}",
headers=HEADERS,
)
print(f"DELETE chat {sid[:8]}... → {r.status_code}")
except Exception as e:
print(f"Could not delete session {sid[:8]}...: {e}")
# Delete collection
try:
r = requests.delete(
f"{API_V1}/collections/{collection_id}",
headers=HEADERS,
)
print(f"DELETE collection {collection_id[:8]}... → {r.status_code}")
except Exception as e:
print(f"Could not delete collection: {e}")
print("\nCleanup complete.")
If you delete a resource that has already been deleted, the API returns 404. This is
safe to ignore in cleanup scripts.
Resources
- Python Client Library — full-featured SDK for h2oGPTe
- SDKs and client libraries — OpenAI-compatible endpoints and language-specific SDKs
- Swagger UI — interactive API explorer for your deployment
- APIs guide — create and manage API keys
- Submit and view feedback for this page
- Send feedback about Enterprise h2oGPTe to cloud-feedback@h2o.ai