Skip to main content
Version: v1.6.41 🚧

Architecture

h2oGPTe Architecture Diagram

Overview​

h2oGPTe is an enterprise RAG (Retrieval-Augmented Generation) platform that provides secure document processing, conversational AI, agentic workflows, and deep research capabilities. The platform enables autonomous AI agents to perform complex multi-step tasks, conduct thorough research across large document collections, and execute sophisticated reasoning chains. The architecture follows a microservices pattern with PostgreSQL for data storage, Redis for caching, MinIO for object storage, and Keycloak for authentication. In the architecture diagram below, green boxes indicate GPU-accelerated services.

Data Flow Patterns​

The h2oGPTe platform implements several key data flow patterns that enable efficient document processing, intelligent retrieval, and real-time AI interactions. These patterns are designed to optimize performance, ensure data consistency, and provide seamless user experiences across different use cases.

1. Document Ingestion Flow​

The document ingestion pipeline handles the complete lifecycle of document processing, from initial upload to final indexing. This flow supports multiple document formats (PDF, Word, Excel, images, etc.) and automatically extracts structured and unstructured content while preserving document layout and formatting. The pipeline includes intelligent chunking strategies, metadata extraction, and multi-modal content understanding.

2. RAG Query Flow​

The Retrieval-Augmented Generation (RAG) query flow combines semantic search with generative AI to provide accurate, contextual responses. This pattern leverages both vector similarity and lexical search to find relevant content, then uses advanced prompt engineering to generate responses that are grounded in your organization's data. The flow includes caching mechanisms for performance optimization and streaming for real-time user interaction.

3. Agent Execution Flow​

The agent execution flow enables autonomous AI agents to perform complex, multi-step tasks using a variety of tools and reasoning capabilities. Agents can plan their approach, execute tools iteratively, and adapt their strategy based on intermediate results. This pattern is essential for workflows that require decision-making, data analysis, or integration with external systems.

Component Responsibilities​

Each component in the h2oGPTe architecture has specific responsibilities and capabilities designed to work together as a cohesive system. The platform's modular design allows for independent scaling, maintenance, and enhancement of individual services while maintaining overall system integrity.

Note: Green boxes indicate GPU-accelerated services that leverage specialized hardware for compute-intensive AI operations.

h2oGPT Service​

The h2oGPT service acts as the LLM abstraction layer for all AI text generation operations in h2oGPTe.

  • LLM Routing: All LLM requests from h2oGPTe go to h2oGPT at configurable endpoints (H2OGPTE_CORE_LLM_ADDRESS, H2OGPTE_CORE_OPENAI_ADDRESS, or H2OGPTE_CORE_AGENT_ADDRESS)
  • Deployment Flexibility:
    • Internal deployment: When h2oGPT runs within the cluster/network, provides access to multiple configured LLMs
    • External deployment: When pointing to external h2oGPT instances with pre-configured LLM choices
  • LLM Abstraction: Abstracts different LLM providers (vLLM, text-generation-inference, Replicate, Azure, OpenAI, AWS Bedrock, H2O MLOps)
  • Prompt Engineering: Handles prompt optimization and template management for different LLMs
  • API Support: Provides both text completion and chat APIs with custom context and prompts
  • Document Processing: Map/reduce API built on langchain for document processing workflows

Frontend (UI)​

The React-based frontend provides a modern, responsive user interface built with TypeScript and Tailwind CSS. It serves as the primary interaction point for users, offering an intuitive experience for both technical and non-technical users.

  • User Interface: Interactive chat interface with markdown rendering, document viewer with highlighting, collection management dashboard, and admin panels
  • Real-time Communication: WebSocket-based streaming for live AI responses, progress indicators for long-running operations, and collaborative features
  • File Management: Drag-and-drop file upload with progress tracking, batch document processing, preview and download capabilities
  • Configuration: User preferences and settings management, theme customization, API key management, and workspace configuration

Mux (API Gateway)​

The Mux gateway, written in Go, serves as the API gateway and authentication layer for all client requests.

  • Authentication: OIDC integration with Keycloak, JWT token validation, session management, guest user support with device fingerprinting
  • Authorization: Role-based access control (RBAC), API key management, license validation
  • Database Integration: PostgreSQL connection pooling with trusted/untrusted user separation, row-level security enforcement
  • Request Routing: HTTP/WebSocket routing to backend services, Redis pub/sub for multi-instance synchronization
  • File Operations: File upload/download handling, streaming support for large documents

Core Service​

The Core service, implemented in Python, acts as the central orchestrator for document processing and LLM interactions.

  • Orchestration: Coordinates document processing workflows and service interactions
  • LLM Management: Handles requests to the h2oGPT service for LLM interactions
  • Configuration: Manages system-wide settings and environment configurations
  • Encryption: Provides encryption/decryption services for sensitive data
  • File Server: Built-in file serving capabilities for document access

VEX Service​

The VEX service provides vector search and indexing capabilities with support for multiple backends.

  • Vector Search: Similarity search using embeddings, HNSW algorithm for internal backend
  • Full-text Search: Text search capabilities alongside vector search
  • Backend Support: Internal backends (HNSW, SQLite), External backends (Elasticsearch, Milvus, Qdrant, Redis)
  • Document Processing: Chunking strategies, embedding generation
  • FastAPI Interface: REST API built with FastAPI and uvicorn

Crawl Service​

The Crawl service handles document ingestion from various enterprise sources.

  • Connectors: SharePoint (On-premise and Online), Azure Blob Storage, Google Cloud Storage, AWS S3, Local file system
  • Document Processing: Document parsing and metadata extraction, Integration with parse capabilities for text extraction
  • Chunking: Document chunking for vector indexing
  • Ingestion Pipeline: Batch and incremental ingestion support

Chat Service​

The Chat service manages conversation sessions and RAG pipeline execution.

  • Session Management: Chat session creation and tracking, conversation history storage in PostgreSQL
  • RAG Pipeline: Document retrieval from VEX service, context injection for LLM prompts
  • Real-time Communication: WebSocket support for streaming responses
  • Integration: Coordinates with Core service for LLM interactions

Parse Service​

The Parse service (integrated within Crawl) provides document parsing and text extraction.

  • Format Support: PDF, Word, Excel, PowerPoint, HTML, images, and various text formats
  • OCR Capabilities: Optical character recognition for scanned documents, multi-language support
  • Text Extraction: Structured text extraction from documents, metadata extraction
  • Integration: Embedded within the crawl service pipeline

Models Service​

The Models service provides GPU-accelerated model serving capabilities.

  • GPU Support: CUDA support for GPU acceleration, configurable GPU device allocation
  • Resource Management: Memory limits and resource constraints, container-based deployment
  • Integration: Works with external h2oGPT service for LLM capabilities
  • PII Detection: Built-in PII detection and redaction capabilities

Deployment Architecture​

The h2oGPTe platform is designed for flexible deployment across various environments, from single-node development setups to large-scale production clusters. The architecture supports containerized deployment using Docker and Kubernetes, with comprehensive configuration management and monitoring capabilities.

Container Architecture​

The platform utilizes a containerized microservices architecture that ensures consistency across environments and simplifies deployment and scaling. Each service runs in its own container with clearly defined interfaces and dependencies.

Deployment Options​

Docker Compose Deployment​

  • Services: h2ogpte-app (Python services), h2ogpte-mux (Go gateway), h2ogpte-ui (React frontend)
  • Infrastructure: PostgreSQL, Redis, MinIO, Keycloak
  • Optional Services: vLLM server for local LLM hosting, Milvus/Elasticsearch for vector search

Kubernetes Deployment​

  • Helm Charts: Available for production deployments
  • Multi-instance Support: Redis pub/sub for service synchronization

Configuration Management​

  • Environment Variables: Extensive configuration via environment variables
  • Settings Service: Centralized configuration management
  • Feature Flags: Runtime feature toggling support

Security Architecture​

Security is built into every layer of the h2oGPTe platform, implementing defense-in-depth strategies to protect sensitive data and ensure compliance with enterprise security requirements.

Authentication and Identity Management​

  • OIDC/OAuth2: Full OpenID Connect and OAuth 2.0 implementation for enterprise SSO
  • JWT Tokens: Token-based authentication with JWKS validation and key rotation
  • OAuth2 Flows: Authorization code flow with PKCE, token refresh, and token exchange
  • Session Management: Redis-backed session storage with automatic token refresh
  • API Keys: Database-stored API keys for programmatic access
  • Guest Users: Device fingerprinting for anonymous access
  • Enterprise IdP Support: Integration with Okta, Azure AD, Keycloak, and other OIDC providers

Authorization and Access Control​

  • PostgreSQL RLS: Row-level security policies for data isolation
  • RBAC System: Database-backed roles and permissions
  • User Types: Regular users, admin users, guest users
  • Trusted/Untrusted Connections: Separate database connection pools based on trust level
  • Audit Logging: Database-level audit trails

Data Security​

  • Encryption: Configurable encryption for sensitive data
  • PII Detection: Built-in PII detection and redaction
  • Secure Cookies: SameSite policies and secure flag
  • Object Storage: MinIO with bucket-level access controls

Infrastructure Security​

  • Container Security: Docker-based isolation
  • Service Communication: Internal service authentication
  • Rate Limiting: Built-in rate limiting and cost controls
  • License Validation: Enterprise license enforcement

Storage Architecture​

  • PostgreSQL: Primary database with 150+ migrations, extensive stored procedures
  • Redis: Caching layer and pub/sub messaging
  • MinIO Buckets: Documents, collections, user data, shared data, agent tools
  • Cloud Storage Support: S3, Azure Blob, Google Cloud Storage integration

Feedback