Architecture
Overview​
The system mostly has a Firebase-like "serverless" architecture, in the sense that there is no API server to speak of, and the UI fetches data from Postgres by invoking Postgres routines via a bridge server. Postgres has row level security (RLS) enabled, and uses the user's OAuth2 credentials to read/write data. Document ingest, indexing, and question answering are all implemented as background workers written in Python. Workers pull jobs from Redis queues. Semantic/lexical search runs as a separate service. Raw content is stored in Minio. All subsystems except databases are stateless, and can be scaled independently of the other subsystems.
Diagram​
Subsystems​
- GUI: Browser-based user interface. Written in Typescript using React and Tailwind.
- MacOS: Native App for MacOS.
- Swagger: Auto-generated REST API.
- Python: Asynchronous and Synchronous Python client using web-sockets.
- Mux: Pass-through HTTP server for the UI to communicate with all backend services, and send/receive chat messages. Written in Go.
- Vex: Multi-modal database written in Python, with HTTP interface. Contains:
- Content chunks with provenance.
- Vector indexes for similarity search.
- Full-text indexes for lexical search.
- Workers: Scalable, distributed. Written in Python.
- Core: Core HTTP server written in Python. Serves low-latency methods used directly by the UI and clients.
- Crawler: Coordinates document ingestion and indexing.
- Chat: Coordinates user chat sessions, RAG and Agentic AI.
- Models: Scalable service for AI helpers for embeddings, OCR, layout, captions, guardrails, audio/image processing, etc. Highly parallelized and GPU optimized.
- h2oGPT:
- All LLM requests from h2oGPTe go to a h2oGPT instance at H2OGPTE_CORE_LLM_ADDRESS or H2OGPTE_CORE_OPENAI_ADDRESS or H2OGPTE_CORE_AGENT_ADDRESS.
- (Default for testing, ~10 different LLMs) If H2OGPTE_CORE_LLM_ADDRESS is internal to the k8s cluster (or the Docker compose network), then h2oGPTe will use the built-in h2oGPT to direct LLM requests to. This gives us the flexibility to configure a multiple LLMs at various other endpoints like h2oGPT running elsewhere, or H2O MLops, Azure, OpenAI, AWS Bedrock, Replicate.com, etc.
- (Default for installs and for HAIC) If H2OGPTE_CORE_LLM_ADDRESS is external to the h2oGPTe-local network, then we can't control it, and the LLM-choices are hardcoded by the remote h2oGPT instance. This is fine if the installer/user/customer wants to use a limited set of h2oGPT/TGI/vLLM/OpenAI/Azure/AWSBedRock endpoints, like for Replicated with custom vLLM or Managed Cloud with just a handful of LLMs in H2O MLOps.
- Does prompt engineering
- Abstracts away different LLMs (locally or remote, e.g., vLLM, text-generation-inference, Replicate, Azure, etc.)
- Provides text completion API and chat API for talking to LLMs, with (or without) custom context and prompts.
- Provides map/reduce API built on top of langchain for processing of documents.
- All LLM requests from h2oGPTe go to a h2oGPT instance at H2OGPTE_CORE_LLM_ADDRESS or H2OGPTE_CORE_OPENAI_ADDRESS or H2OGPTE_CORE_AGENT_ADDRESS.
- h2oGPT_Agent:
- Multimodal Agentic AI
- Planning
- Code execution
- Review and Iteration
- Tool usage
- GPU optimized helpers for additional capabilities:
- Image generation
- Speech to Text
- Text to Speech
- Multimodal Agentic AI
- Redis: Contains:
- User sessions.
- Job queue and job scheduler data/stats.
- Pub/sub for brokering chat messages.
- Minio: Object storage for raw content and documents.
- Postgres:
- Meta data about users, collections, and documents.
- Chat sessions and message history.
- LLMs:
- Private LLMs (air-gapped, on-premises)
- HuggingFace Models + vLLM
- Private LLMs (hosted in cloud by customer)
- HuggingFace Models + vLLM
- External LLMs (third-party)
- Azure/OpenAI
- Anthropic
- Amazon
- Mistral
- Grok/Together.ai/Replicate
- Private LLMs (air-gapped, on-premises)
Scaling​
- Mux is stateless. Load-balance additional instances to improve concurrency.
- Background Workers are also stateless. Spin up more instances to improve throughput.
Feedback
- Submit and view feedback for this page
- Send feedback about Enterprise h2oGPTe to cloud-feedback@h2o.ai