Skip to main content

Return to docs

What's new in h2oGPTe v1.7.x

ยท 13 min read

We are excited to announce the release of h2oGPTe 1.7! Read on to learn about the new features and improvements in the h2oGPTe 1.7.x series that expand agent control and introduce enterprise-grade governance capabilities.

Overview of the 1.7.x Releaseโ€‹

The h2oGPTe 1.7.x series expands agent control with Human-in-the-Loop feedback and persistent Memory Blocks, improves RAG performance, and introduces enterprise-grade API governance and fairness controls alongside significant UI enhancements.

Major featuresโ€‹

  • Human-in-the-Loop (HITL) & Agent Control: Active model guidance and feedback mechanisms combined with comprehensive behavior customization options for AI agents.
  • Memory Blocks: Persistent context blocks that agents read from and write to across sessions, with group-based access control and automatic sharing when collections are shared.
  • Native MCP Integration: Standardized Model Context Protocol server for connections to external tools, chat, and document collections, with sub-tool filtering, an inspection UI, and a standalone tool marketplace page.
  • Enterprise Governance & Fairness: Per-user API rate limiting, automated API key deactivation triggers, and fairness mechanisms to prevent system resource monopolization.

Key improvements & fixes across releasesโ€‹

  • Performance & Speed: Significantly faster document indexing, document ingestion, and parallel file downloads. Real-time RAG progress messaging now counts unique documents rather than raw chunks. (v1.7.0)
  • Security & Guardrails: Hardened MCP server security. Enhanced AI guardrails by adding collection metadata to context, with improved UI feedback for guardrail violations. (v1.7.0)
  • Scheduled Task Reliability: Multiple race condition and state management fixes for scheduled tasks, including correct pause behavior, timestamp advancement, and manual Run Now on paused tasks. (v1.7.1)
  • Auto Memory Blocks: Persistent memory blocks are now automatically created for chats without manual configuration. (v1.7.1)

Major patch releases

h2oGPTe v1.7.1โ€‹

New featuresโ€‹

Modelsโ€‹

  • New model added: Support for Google Gemma-4-31B-IT (via OpenRouter), a multimodal model with a 256ย K context window and optional reasoning mode.

Chatโ€‹

  • Chat name column: The chat list within a collection now displays the chat name as a dedicated column for easier browsing.
  • Sample questions on welcome screen: Default sample questions appear below the chat input on the welcome screen to help you get started.
  • Improved references panel: References in chat responses are now displayed inline and in a dedicated side panel for easier review.

Documents and collectionsโ€‹

  • Document download: Download individual documents directly from the document list. The actions column is now pinned and stays visible as you scroll.
  • Multilingual PII detection: PII detection now identifies personally identifiable information across multiple languages, extending privacy protection beyond English-only content.

Memory Blocksโ€‹

  • Auto memory blocks: Memory blocks are now created automatically for chats, so context persists across sessions without any manual setup.

Rate limitingโ€‹

  • Rolling 30-day quota: Usage limits now reset on a rolling 30-day window rather than at the start of each calendar month, ensuring consistent enforcement throughout the month.
  • Rate-limit notifications in chat: When a chat session is rate-limited, the notification is now saved in the chat history so you can see exactly where throttling occurred.

Scheduled tasksโ€‹

  • Run Now on paused tasks: You can now manually trigger a paused scheduled task at any time. Pausing only stops the automatic schedule; manual runs are always available.
  • Collection sync across schedule dialog and sidebar: Changing the collection in the schedule dialog or the chat sidebar now keeps both views in sync automatically.

Integrationsโ€‹

  • Excel Add-in: A new Microsoft Excel add-in brings AI-powered spreadsheet functionality directly into Excel. Chat with your data, generate formulas, and run h2oGPTe queries without leaving your spreadsheet.
  • Google Sheets Add-on: A new Google Sheets add-on adds AI-powered chat capabilities to Google Sheets. Ask questions about your data and get AI-generated insights inline, directly within your existing Google Workspace workflow.

Authentication and access controlโ€‹

  • Role-based access allow list: Administrators can restrict platform access to users in a configured set of roles. Users outside the allow list are denied access at sign-in.
  • Role-based access revocation: Administrators can revoke access for entire role groups, complementing the existing per-user revocation controls.
  • Public chat sharing: Shared chat links can now be opened by recipients without requiring them to sign in.
  • Share link in dialog: The share link for a chat is now shown in the same dialog, without navigating away.
  • Live Logs enhancements: The Live Logs admin view now supports filtering by log level and downloading log output directly from the UI, making it easier to triage and share diagnostic information.

Improvementsโ€‹

Scheduled tasksโ€‹

  • Prompt required: Scheduled tasks now require a prompt to be set before saving, so tasks cannot be created without input.
  • Timezone and collection labels: Timezone and collection fields in the schedule dialog are now fully localized.

User interfaceโ€‹

  • Cancel reason guidance: The cancel reason field now shows a character limit and placeholder text to help you provide a useful reason.

Python clientโ€‹

  • Adds cancel_reason support when cancelling jobs.
  • Improves forward compatibility when connecting to a newer server version.

Bug fixesโ€‹

Scheduled tasksโ€‹

  • Fixed paused scheduled tasks accumulating stale execution state across multiple run cycles.
  • Fixed the /schedule slash command not working after a recent update.
  • Fixed scheduled task timestamps not updating correctly when a task was paused mid-execution.
  • Fixed completed and expired scheduled tasks incorrectly accepting manual trigger requests.
  • Fixed the "Next run" column showing a timestamp for paused and completed tasks.

Chat and UIโ€‹

  • Fixed the chat side panel remaining open and empty after navigating between collection chats.
  • Fixed the prompt template not reflecting the collection's setting when the session had no override set.
  • Fixed the memory block selection not saving when cleared at the collection level.
  • Fixed the collection edit dialog failing when the selected collection had been deleted.

Authenticationโ€‹

  • Fixed file downloads failing for certain authentication methods.

AI and processingโ€‹

  • Fixed agent mode failing to activate correctly in certain configurations.

h2oGPTe v1.7.0

Note

Features marked with * were backported to earlier v1.6.x patch releases. If you are upgrading from v1.6.54 or later, some of these features may already be available in your current installation.

New featuresโ€‹

Agents and toolsโ€‹

  • Human-in-the-Loop (HITL) course correction: You can now actively guide and provide feedback to the model during generation.
  • Advanced Agent controls: Fine-tune agent behaviors with comprehensive new customization options.
  • Agent tool creation: Create new tools directly from agent files using an improved UI workflow.
  • Memory Blocks: Persistent memory blocks let agents store and retrieve context across sessions. Blocks can be set as the default for new chats, shared with groups, and are automatically shared when their parent collection is shared.
  • Native MCP server: A native Model Context Protocol (MCP) server built with the Go SDK provides standardized access to chat, documents, and collections.
  • MCP sub-tool filtering and inspection: Filter available MCP sub-tools through SDK options and inspect tool definitions through a dedicated UI.
  • MCP tool marketplace: A dedicated Tools page lists all available MCP tools as a browsable marketplace.

RAG and AI modelsโ€‹

  • New models added: Support for Claude 4.6 Sonnet*, Claude 4.6 Opus*, Gemini 3.1 Pro Preview, GPT-5.2*, and three NVIDIA Nemotron models via OpenRouter: Nemotron-3-Super-120B-A12B, Nemotron-3-Nano-30B-A3B, and Llama-3.3-Nemotron-Super-49B-v1.5.
  • Fast Agentic RAG: A performance-optimized mode for Agentic RAG that reduces retrieval latency.
  • RLM RAG: A Recursive Language Model generation approach that iteratively refines retrieval and generation steps for improved answer quality.
  • Agentic RAG improvements: Enhanced agentic RAG capabilities for more reliable multi-step retrieval workflows.
  • RAG streaming progress: Real-time progress indicators appear during RAG operations.
  • RAG inclusion/exclusion filters*: Apply include and exclude filters on chat queries to control which documents are used in retrieval.
  • Smart chat history: The model now decides whether a query requires context from the chat history.
  • Model tiering system: Organize and manage models by tier for streamlined model selection.
  • Model notifications: You are automatically notified when new models are added to the platform.

Chat and collaborationโ€‹

  • Chat branching: Branch off into new conversations from specific messages within an existing chat session.
  • Chat processing visibility: View your exact queue position during chat processing with real-time feedback.
  • Fairness throttle notifications: Receive a notification when your session is throttled, with automatic reconnection if the chat connection drops.
  • Auto-generated turn titles*: The system automatically generates descriptive titles for chat turns.
  • Enhanced citations*: Specific passages from citations are now visually highlighted, and references display as pills for better readability.
  • Slow chat notifications: You receive notifications when chat or crawl jobs take longer than expected.
  • Quote-to-ask: Highlight any text in a chat response and instantly quote it into the message input to ask a follow-up question in context.

Scheduled tasksโ€‹

  • Scheduled Tasks UI: A new dedicated Scheduled Tasks page lets you create, view, update, and delete scheduled AI workflows directly from the UI; no API or scripting required.
  • Execution tracking: Monitor each task's status, last-run timestamp, and execution history. Tasks can be set to run on a recurring schedule and paused or resumed at any time.
  • Automated AI workflows: Combine a prompt, a collection, and a schedule to run repeating AI queries automatically โ€” useful for recurring summaries, monitoring, or report generation.
  • /schedule slash command: Start a new scheduled task directly from the chat input using the /schedule slash command.

Documents and collectionsโ€‹

  • Collection management: Pin favorite collections for quick access, and access Most Recently Used (MRU) collections in the sidebar.
  • Collection import in Agent File Explorer: Import collections directly from the agent file explorer.
  • Collection settings restructure: Chat and ingestion settings are now integrated into collection settings for a unified configuration experience.
  • Enhanced document indexing: Phase-based job progress tracking for indexing operations.
  • Eval Studio integration: Topic Modeling is now integrated directly into Eval Studio.
  • Enhanced feedback export: Export feedback by collection and date range.
  • Collection evaluations table: Collection evaluations now include clear, descriptive labels.
  • Confluence support*: The Confluence connector now fully supports attachments.
  • File version history: Documents now track version history. Upload a new version of a file and h2oGPTe groups it with previous versions, letting you browse, compare, and manage the full revision history of any document.

User interfaceโ€‹

  • KaTeX toggle: Switch KaTeX math rendering on or off in preferences.
  • PDF viewer original mode*: View PDF documents in their original format.
  • Inline images in side panel: Open inline images and file links in the side panel without leaving the current view.
  • Artifact Viewer full-screen: Full-screen support for the Artifact Viewer.
  • Chat drag-and-drop: Improved chat drag-and-drop capability.
  • Dynamic dialog height: Dynamic height adjustment for the Add Document dialog.
  • Per-form field reset: Reset individual form fields without clearing the entire form.
  • PII toggle alert: An information alert displays when toggling PII detection settings.
  • System settings tabs and search: The system settings page now supports tabs and search.
  • Customizable reference highlight colors: Customize the colors used to highlight referenced passages.
  • Extractor UI expansion: Expanded extractor interface with additional fields and custom field support.
  • Autosync connector jobs page: A new UI page for monitoring autosync connector jobs.

Authentication and securityโ€‹

  • API key management: Administrators can track detailed API key usage and control permissions.
  • API key auto-deactivation: Keys are automatically deactivated when an administrator revokes the associated user's permissions.
  • Per-user API rate limiting: Set and enforce per-user rate limits on API requests.
  • OAuth improvements: Improved OAuth dropdown rendering.
  • Signup abuse prevention: New signup abuse prevention mechanisms with administrator bypass options.
  • System notifications: Administrators can create and manage system-wide notifications that are displayed to all users across the platform, useful for announcing maintenance windows, policy changes, or important updates.

Improvementsโ€‹

Performance and processingโ€‹

  • Significantly faster indexing: Major performance upgrades to document ingestion and indexing, with phase-based job tracking that shows progress per indexing phase.
  • Real-time visibility: View your exact queue position during chat processing and watch real-time RAG streaming progress, now counting unique documents instead of raw chunks.
  • Collection lifecycle management: Import collections with lifecycle settings and use the skip_reparse* option to import without re-parsing content.
  • Copy linked connectors: Collection owners can copy linked connectors to new collections.

User interface and experienceโ€‹

  • Upgraded file viewing: The PDF viewer now supports an original view mode and displays page numbers. Inline images and file links open in the side panel.
  • Artifacts and code: The Artifact Viewer supports full-screen mode, and KaTeX rendering can be toggled on or off.
  • Workspace fluidity: Improved chat drag-and-drop capability, dynamic height adjustments for dialogs, and a responsive sidebar aligned to standard breakpoints.
  • Sidebar rename: The sidebar Docs link is renamed to "Docs & Guides."

Security and guardrailsโ€‹

  • Improved guardrail violation feedback: Better UI feedback when guardrail violations occur.
  • Streaming with prompt guard: Streaming is now maintained when only prompt guard is enabled.

Language and content handlingโ€‹

  • Enhanced character support: Expanded UTF-8 support for CJK, Hebrew, and Hindi characters in text files.
  • Markdown handling: Improved handling of Markdown formatting within the PII model, and Markdown files are now previewed natively as text.
  • Chromium SVG-to-PDF conversion: Improved SVG-to-PDF conversion using Chromium.

Python clientโ€‹

  • Adds comprehensive job control functions* (pause, stop, play, finish).
  • Adds user list pagination and the ability to retrieve public collection permissions.
  • Adds support for vector database migration and custom metadata keys.

Bug fixesโ€‹

Chat and UIโ€‹

  • Fixed reversed roles when exporting or downloading chat history.
  • Fixed chat table viewport overflow issues.
  • Fixed the share button width in full panel mode.
  • Fixed single dollar signs ($) being interpreted as math formula delimiters.
  • Fixed file auto-switching behavior.
  • Fixed the Include chat history setting not being honored when configured at the collection level.
  • Fixed label clicks incorrectly triggering guardrail deletion.
  • Fixed the OCR model and Audio input language fields being swapped in the upload dialog.
  • Fixed the self-reflection toggle being incorrectly disabled in Agent Mode.
  • Fixed the 401 insufficient permissions page display.
  • Fixed delete message tooltip not appearing.
  • Fixed dictated content leaking between messages.

AI and processingโ€‹

  • Fixed chat history not being used when vision is enabled.
  • Fixed guardrail-violating content appearing in chat history.
  • Fixed agentic RAG failing in the same chat session when using agent-only files.
  • Fixed import jobs not failing when PDF conversion fails.
  • Fixed incorrect cost display when no vision model is configured.
  • Fixed text-to-speech not working in certain configurations.
  • Fixed OCR text insertion into PDF documents.
  • Fixed document thumbnails failing to generate for certain image formats.

Connectors and integrationsโ€‹

  • Fixed Confluence connector null OAuth token issue.
  • Fixed GCS connector stability issues that caused document ingestion failures.
  • Fixed connector settings button visibility based on permissions.

Security and permissionsโ€‹

  • Fixed collection-level permissions for chat deletion.

Support and resourcesโ€‹

For technical support and questions about this release:

Upgrade informationโ€‹

We recommend upgrading to the latest version of h2oGPTe 1.7.x to access these improvements. The upgrade process preserves all your existing data and configurations.


Feedback

What's new in h2oGPTe v1.6.x

ยท 27 min read

We are excited to announce the release of h2oGPTe 1.6! Read on to learn about the new features and improvements in h2oGPTe 1.6.x series that help you find answers and generate content based on your private data.

Overview of the 1.6.x Releaseโ€‹

The h2oGPTe 1.6.x series delivers significant capabilities for autonomous AI assistance, enhanced APIs, and improved document processing:

Major featuresโ€‹

  • Agent functionality: Autonomous tool use with code execution, placing #1 on the GAIA leaderboard for General AI assistants
  • Role-Based Access Control (RBAC): Comprehensive permission system for users and groups
  • REST API: OpenAPI-compliant interface with auto-generated bindings for Python, JavaScript, and Go
  • Custom GPT creation: Collection-based configuration system for tailored AI assistants

Key improvements & fixes across releasesโ€‹

  • Updated supported LLMs list: Added coverage of new large language models, including OpenAI's GPT-5 series (v1.6.37)
  • OpenAI compatible API: Added OpenAI API compatibility for integration with existing applications, supporting chat completions, responses, and streaming (v1.6.37)
  • Exception handling and resilience: Improved exception handling and increased resilience for document ingestion and old parsing issues (v1.6.35)
  • Internationalization: Added RTL language support to improve UI compatibility with right-to-left languages (v1.6.34)
  • Enhanced mobile experience: Better navigation, responsive design, and improved card layouts (v1.6.33)
  • Developer experience: Async API examples, simplified client connections, and enhanced code execution (v1.6.33)
  • System reliability: Better error handling, improved self-tests, and enhanced stability (v1.6.32, v1.6.33)
  • Performance optimizations: Faster document ingestion, improved chat queries, and streamlined processing (v1.6.32)
  • Client stability: Improved real-time chat stability and connection reliability for web applications (v1.6.40)
  • Multilingual support: Added translation framework supporting both left-to-right and right-to-left languages and full Spanish language support (v1.6.41)
  • Agent capabilities: Added support for selecting user personas when using an agent and a specialized Data Science agent type for advanced analytics tasks (v1.6.41)
  • Enterprise security: Secret Manager and Secure Connectors now configurable by administrators for centralized credential management and secure external integrations (v1.6.43)
  • Workspaces: Added workspace functionality with tagging and personal workspace migration (v1.6.43)
  • Enhanced agents: New Tool Builder Agent and improved agent response UI (v1.6.43)
  • New connectors: Added new connector to import content directly from Atlassian Confluence (v1.6.43)
  • Improved document management: Added video scene descriptions and auto-tagging with agents (v1.6.43)
  • Refined chat and UI experience: Redesigned Chats page, improved chat toolbar, better references view, and clearer collection context in chats (v1.6.44)
  • Inline reference pills: Clickable citation pills in chat responses link directly to source document passages with PDF highlighting (v1.6.54)
  • RAG query filters: Include or exclude specific content with filters on chat queries (v1.6.54)
  • New model support: Added GPT-5.2 and Gemini-3-pro-preview model options (v1.6.54)
  • Confluence attachments: Support for document attachments in the Confluence connector (v1.6.54)
  • Chat generation control: Added APIs to pause, resume, stop, and finish ongoing chat message generation (v1.6.55)

Major patch releases

h2oGPTe v1.6.57โ€‹

New featuresโ€‹

  • Enhanced Model Capabilities: Added support for Gemini 3.1 Pro Preview, Claude 4.6 Sonnet, and Claude 4.6 Opus. Additionally, retired models (Claude 3.5 Haiku and Claude 3.7 Sonnet) have been removed. For a complete list of currently available models, visit the Supported LLMs page.
  • Optimized Collection import: You can now use the skip_reparse option to import collections without re-parsing content, reducing import time when re-indexing isnโ€™t needed.

Known limitationsโ€‹

  • GCS connector: The Google Cloud Storage (GCS) connector is not currently supported.

Securityโ€‹

  • UI Corrections: Fixed a bug where the collection dropdown menu would return a null value, restoring full functionality to the interface.
  • Security Updates: Applied critical security updates and resolved multiple vulnerabilities (CVEs) across the platform, including specific security enhancements for MCP functionality.

h2oGPTe v1.6.0

Released: January 31, 2025

Agent featuresโ€‹

Agent overviewโ€‹

The h2oGPTe Agent enables autonomous tool use through code execution. It uses large language models (LLMs) for code generation and reasoning. The agent achieved #1 ranking on the GAIA leaderboard, which measures General AI assistant usefulness.

Key features include:

  • Deep Research assistance: Provides autonomous analysis with full transparency into decision-making processes
  • Comprehensive output: Delivers analysis summaries, internal chat transcripts, and downloadable artifacts for each conversation
  • File management: Gives you access to newly created files (PDF, Excel, PowerPoint) and all code snippets used in document creation

Agent controlโ€‹

You can enable or disable the agent through the chat input interface. The agent operates autonomously based on your prompts and configured prompt templates.

Accuracy Presets control conversation depth and processing time:

  • Quick
  • Basic (default)
  • Standard
  • Maximum

Each preset defines the number of conversation turns and maximum processing time per turn.

Agent toolsโ€‹

Administrators can enable or disable these agent tools for chat queries:

Code and Development:

  • Aider Code Generation
  • Shell Scripting
  • Python Coding

Data Visualization:

  • Mermaid Chart-Diagram Renderer
  • Image Generation

Content Processing:

  • Ask Question About Image
  • Audio-Video Transcription
  • Convert Document to Text
  • Screenshot Webpage

Research and Search:

  • Google Search
  • Bing Search
  • Scholar Papers Search
  • Wolfram Alpha Math Science Search
  • Wikipedia Articles Search
  • Wayback Machine Search
  • Web Image Search

Document Analysis:

  • Ask Question About Documents
  • RAG (Retrieval-Augmented Generation) Vision
  • RAG Text

System Integration:

  • H2O Driverless AI Data Science
  • Browser Navigation
  • Download Web Video
  • Advanced Reasoning
  • Evaluate Answer

Network Access:

  • Internet Access
  • Intranet Access

Access control and permissionsโ€‹

Role-based access control (RBAC)โ€‹

This version introduces a comprehensive role and permission system. Each role contains specific permissions, and administrators can assign roles to users and groups from federated authentication providers like LDAP.

Available permissions:

Chat management:

  • Delete chats
  • Submit chat feedback

Collection management:

  • Add collections
  • Delete collections
  • Edit collections
  • Make collection public
  • Share collections

Document management:

  • Add documents
  • Delete documents

Template management:

  • Delete prompt templates
  • Edit prompt templates
  • Share prompt templates

System administration:

  • Show admin center
  • Allow device pairing when configured
  • Show extractors
  • Show live logs
  • Show models page
  • Show private button
  • Manage roles
  • Display system notifications
  • Display developer settings

API and developer toolsโ€‹

REST APIโ€‹

A new REST API complements the existing Python RPC client. The API conforms to the OpenAPI standard and provides built-in Swagger UI documentation.

Auto-Generated bindings:

  • Python REST API
  • JavaScript REST API
  • Go REST API

Custom GPT creationโ€‹

Create custom AI assistants using the formula: Collection + Collection Settings + Default Chat Settings = Custom GPT.

Each collection contains default chat settings that apply to new conversations. You can apply current settings as collection defaults through the Apply current settings as collection defaults button or via API.

Code generation for chat messagesโ€‹

Each chat message displays the equivalent Python client code, enabling developers to replicate queries programmatically.

Model and processing improvementsโ€‹

Reasoning model supportโ€‹

The models page displays reasoning capabilities and shows which reasoning models support non-reasoning models, similar to vision model relationships. Reasoning models work with chat, RAG (Retrieval-Augmented Generation), and agent use cases.

Vision capabilitiesโ€‹

Enhanced vision model functionality across the platform.

Document processingโ€‹

Parsing improvements:

  • Layout detection
  • Chunking algorithms
  • Image captioning
  • Text conversion
  • Document highlighting
  • Excel handling (large tables are summarized while remaining fully accessible to agents for data analysis)

Handwriting recognitionโ€‹

The H2O Mississippi model provides default handwriting-to-text transcription and ships with the platform.

Supported LLMsโ€‹

Support for proprietary and open-source models includes:

Cloud providers:

  • Claude 3.5 (Bedrock)
  • OpenAI o1 (Azure)
  • OpenAI o1-mini (Azure)
  • Gemini 2.0 Flash
  • Gemini 2.0 Flash Thinking

Open-Source Models:

  • DeepSeek V3
  • DeepSeek R1
  • MiniMaxAI
  • Qwen/Qwen2.5
  • Qwen/Qwen2-VL
  • Qwen/QwQ
  • Llama-3.3-70B
  • Llama-3.2-11B-Vision
  • Llama-3.2-90B-Vision

H2O Models:

  • H2O Mississippi

Performance and scalabilityโ€‹

Architecture improvementsโ€‹

  • Models Service: Redesigned backend enables horizontal scaling for document ingestion and chat through a dedicated service shared by chat, crawl, and core services
  • Auto-Scaling: Optional KEDA-based auto-scaling for the models service
  • Database Operations: Parallelized Vex database operations
  • Conversion Speed: Accelerated text-to-PDF conversion

User experience enhancementsโ€‹

Interface improvements:

  • Separate guardrails and PII (Personally Identifiable Information) settings
  • GUI-based custom guardrails configuration
  • Enhanced PDF display
  • Collection thumbnails
  • Improved models page layout
  • Better scrolling and pagination
  • Syntax highlighting for markdown and code blocks
  • Enhanced job cancellation
  • Faster automatic RAG type detection

Model testingโ€‹

Self-Test enhancements:

  • Functional self-tests with multimodal RAG and guided JSON
  • Better detection of model endpoint configuration issues

Administrative featuresโ€‹

White labelingโ€‹

Customization options include:

  • Custom logos
  • Color schemes
  • Greeting messages
  • Personality configuration in prompt templates

Topic modelingโ€‹

Generate topic model visualizations for any collection with a single click. Visualizations show clusters of similar phrases and concepts, providing content overviews and identifying areas for content optimization.

Collection managementโ€‹

Lifecycle controls:

  • Configurable collection expiration times
  • Collection size limits

Performance optimizationsโ€‹

  • Faster public chat sharing

Securityโ€‹

No critical or high CVEs at the time of release.

Support and resourcesโ€‹

For technical support and questions about this release:

Additional resourcesโ€‹

Upgrade informationโ€‹

We recommend upgrading to the latest version of h2oGPTe 1.6.x to access these improvements. The upgrade process preserves all your existing data and configurations.


Feedback

What's new in h2oGPTe v1.5

ยท 5 min read

We are excited to announce the release of h2oGPTe 1.5! Read on to learn about the new features in the 1.5.x series which will improve your ability to find answers and generate new content based on your private data.

Chat Firstโ€‹

The GUI has been revamped to lead with chat first. You can start chatting immediately, and add documents or collections to the chat.

New Connectorsโ€‹

New connectors in v1.5:

  • Amazon S3
  • Google Cloud Storage
  • Azure BLOB
  • Sharepoint
  • Upload Plain Text (automatically triggered if large text is copy & pasted into the chat)

Choice of OCR modelsโ€‹

The v1.5 release brings support for more languages by introducing a new set of OCR model choices (for conversion of documents to text), including the auto-detection of the language for each page of every document.

  • Automatic (default)
  • Tesseract (over 60 different languages)
  • DocTR
  • PaddleOCR
  • PaddleOCR-Chinese
  • PaddleOCR-Arabic
  • PaddleOCR-Japanese
  • PaddleOCR-Korean
  • PaddleOCR-Cyrillic
  • PaddleOCR-Devanagari
  • PaddleOCR-Telugu
  • PaddleOCR-Kannada
  • PaddleOCR-Tamil

Model Comparison Pageโ€‹

A new models page offers easy comparison between all LLMs

  • Tabular view of all metrics such as cost, accuracy, speed, latency, context lengths, vision capabilities, guided generation features and chat template
  • Graphical scatter plot to compare models across 2 dimensions, with optional log-scale
  • Usage and performance stats are now shown as a tab on the models page
  • A self-test button shows green or red lights for each LLM within secons to confirm that all LLMs are operational with "quick" and "rag" benchmark modes exposed to all users that test chat and RAG modes.
  • Admins have access to "full" and "stress" tests as well, to make sure LLMs are configured to handle large contexts properly.

Model Routing with Cost Controlsโ€‹

Automatically chooses the best LLM for the task given cost constraints such as:

  • Max cost per LLM call
  • Willingness to pay for extra accuracy (how much to pay for +10% accuracy for this LLM call?)
  • Willingness to wait for extra accuracy (how long to wait for +10% accuracy for this LLM call?)
  • Max cost per million tokens for LLMs to be considered
  • Fixed list of models to choose from

Any of these cost controls can be combined. The GUI exposes the first 3 cost constraints.

Guardrailsโ€‹

Fully customizable Guardrails:

  • Prompt Guard (fine-tuned DeBERTa v3 model), Jailbreak and Prompt Injection
  • Llama Guard (fine-tuned LLM), 14 classes of unsafe content
  • Custom Guardrails, arbitrary LLM and prompting

Guardrails are applied to:

  • All user prompts

If unsafe content is detected, the following action is performed:

  • fail

Redaction of PII or regular expressionsโ€‹

These PII detection methods are combined for maximum precision and recall:

  • regular expressions
  • Presidio model: 5 languages (en, es, fr, de, zh), 36 different PII entities
  • Custom PII model: 59 different PII entities

Personally identifiable information (PII) is checked for in these places:

  • Parsing of documents
  • LLM input
  • LLM output

If PII is detected in any of the above places, one of the following actions is performed:

  • allow
  • redact
  • fail

You have full control over the list of entities to flag, via JSON spec, controllable per collection.

Document Metadataโ€‹

You can now choose what information from the document is provided to the LLM.

  • Filename
  • Page Number
  • Document Text
  • Document Age
  • Last Modification Date
  • Retrieval Score
  • Ingestion Method
  • URI

Multimodal Vision Capabilitiesโ€‹

v1.5.x brings support for multimodal vision capabilities, including state-of-the-art open-source vision models. This allows processing of flowcharts, images, diagrams and more.

  • GPT-4o/GPT-4o-mini
  • Gemini-1.5-Pro/Flash
  • Claude-3/Claude-3.5
  • InternVL-Chat
  • InternVL2-26B/76B

Support for upcoming LLMs via Chat Templatesโ€‹

v1.5.x can support yet unreleased future LLMs using Hugging Face chat templates.

Guided Generationโ€‹

A powerful new feature in v1.5 is the guided generation. For example, the LLM can be instructed to create perfect JSON that adheres to a provided schema. Or it can be instructed to create output that matches a regular expression, or follows a certain grammar, or contains only output from a provided list of choices.

All these powerful options are exposed in the API:

  • guided_json
  • guided_regex
  • guided_choice
  • guided_grammar
  • guided_whitespace_pattern

Note that guided generation also works for vision models. For most (proprietary) models not hosted by vLLM (such as OpenAI, Claude, etc.), only guided_json is supported for now.

Document AI: Summarize, Extract, Processโ€‹

The document summarization API was generalized to full document processing using the map/reduce paradigm for LLMs. In combination with the new connectors, custom OCR models, document metadata, PII redaction, guided generation, multimodal vision models, prompt templates, powerful Document AI workflows are now possible.

Example use cases:

  • Custom summaries
  • Convert flow charts to custom JSON
  • Extract all financial information
  • Classify documents or images with custom labels

Tagging Documents and Chatting with a subset of the Collectionโ€‹

You can now tag documents (via the Python client), and provide a list of tags to include when chatting with a collection.

Out of the box prompt templatesโ€‹

Multiple new prompt templates were added for convenience.

Improved Scalability and Speedโ€‹

Several improvements to improve the responsiveness of the application have been made.

Eval Studio integrationโ€‹

H2O Eval Studio is now integrated into h2oGPTe.

Sharing of Prompt Templatesโ€‹

Prompt templates can now be shared with selected users.

Improved Cloud integrationโ€‹

Minio backend for storage can be replaced with S3. GCS/Azure storage backend is upcoming.

Security Vulnerability Fixesโ€‹

No critical or high CVEs at the time of release.

Live logs for adminsโ€‹

Real-time logs for core/crawl/chat services for administrator users.


Feedback

What's new in h2oGPTe v1.4.13

ยท 8 min read

We are excited to announce the release of h2oGPTe 1.4.13! Read on to learn about the new features in the 1.4.x series which will improve your ability to find answers and generate new content based on your private data.

Create non-English embeddingsโ€‹

Your data isn't always in English. In fact, your documents, audio files, and images may span many languages, and now, h2oGPTe can help you answer questions on any language.

v1.4.x brings support for a new embedding model, bge-m3. This embedding model is best in class for multi-lingual data and supports more than 100 languages.

We recommend using bge-large-en-v1.5 for English use cases and this is the default embedding model used in the environment.

Customize the Embedding Model per Collectionโ€‹

You may want to customize the embedding model used for each collection of documents or use case, and now you can when creating a new collection.

All documents added to this collection will be embedded using that model, and all queries to this collection will use that embedding model. Please note that you cannot change the emedding model of a collection after the fact, it is only editable while creating the collection.

Embedding Model optionsโ€‹

The generative AI space is moving fast and there are new technologies every week. H2O.ai is regularly adding support for new embedding and language models. Today, you can enable the following embedding models in your environment:

  • bge-large-en-v1.5
  • bge-m3
  • instructor-large
  • bge-large-zh-v1.5
  • multilingual-e5-large
  • instructor-xl

Support for new LLMsโ€‹

The v1.4 release series brings support for many new LLMs including H2O.ai's small language models H2O Danube. Working with Southeast Asia? You may want to use SeaLLM-7B-v2 or sea-lion-7b-instruct.

The full list has 18+ types of LLMs supported with the latest and greatest regularly being added.

Introducing the Prompt Catalogโ€‹

Gone are the days of having a collection of really good System Prompts saved in a file on your desktop! The new Prompt Catalog comes with 18+ out of the box prompts for common tasks like Question Answering (in many languages) and Expert Summarizing and tones like Funny or Simple.

You can explore all of the available prompts and create your own on the new Prompts page.

When chatting, click the Settings button down near where you chat to chate the prompt template you're using.

Out of the Box Language Promptsโ€‹

The following languages are supported for RAG out of the box by selecting the right prompt template, remember to also use the multi-language embedding model:

  • Chinese
  • Turkish
  • Russian
  • Portuguese
  • Malay
  • Japanese
  • Indonesian
  • Hindi
  • French
  • Persian
  • Spanish Castilian

Out of the Box English Use Casesโ€‹

While users can create their own prompt templates, there are also 9 common use cases supported out of the box.

  • Retrieval Augmented Generation (this is the default)
  • Expert Summarizer: Generates structured summaries
  • Plain and Simple: Creates simple answers that are easily understood
  • Funny and Sarcastic: Creates a humorous personality that injects sarcasm into responses
  • Citations: Provide citations and references to end users
  • Keynote Presentation: Generates TED-quality keynote presentations
  • Tech Impact Analyzer: Generates impact analysis of technology projects
  • Value Determinator: Determines the value provided to the reader
  • Engineering Wizard: Designs and evaluates different proposals for solving engineering problems

When creating your own prompt template, you can choose to use any of these as the starting point if you would like to slightly alter them.

Example: Citation Prompt Templateโ€‹

As an example, we have uploaded the h2oGPTe documentation into the h2oGPTe product and asked "How can I add a new document into a collection?". With the default prompt template, which is optimized for Retrieval Augmented Generation, we get the following helpful answer:

To add a new document into a collection, follow these steps according to the information provided in the document context:

1. Navigate to the Enterprise h2oGPTe navigation menu.
2. Click on 'Collections'.
3. Select the name of the Collection you want to add a document to.
4. Click on '+ Add documents'.
5. Choose a method to import the document, such as Upload documents or Import from system.
6. Browse for the document you want to add.
7. Click 'Add'.

Please note that the exact wording of the navigation menu items and buttons might vary slightly depending on the specific version of Enterprise h2oGPTe you are using.

This helps our end users, but maybe would would like to also provide citation to users so they can go read the full documentation themselves. We will click Settings and select Citation from the dropdown list of Prompt Templates. Asking the same question, we get the same answer, except now we have a list of the exact documentation pages that the answer came from.

To add a new document into a collection, follow these steps according to the document sources provided:

1. Navigate to the Enterprise h2oGPTe interface.
2. Click on 'Collections' in the navigation menu.
3. Select the specific collection where you want to add a document.
4. Click on '+ Add documents'.
5. Choose a method to import the document, such as 'Upload documents' or 'Import from system'.
6. Browse and select the document you wish to add.
7. Click 'Add'.

Remember, you can only select an embedding model once when creating a new collection. Also, you can toggle the 'Create short document summaries' button to auto-generate a summary of your document, and the 'Create sample questions for documents' button to receive auto-suggested sample questions based on your document.

[1] Guide - Collections - Add a Document(s) to a Collection - Version: v1.4.11 - Enterprise h2oGPTe documentation
[2] Create a Collection - Version: v1.3.11 - Enterprise h2oGPTe documentation
[3] Add a Document(s) to a Collection - Version: v1.3.11 - Enterprise h2oGPTe documentation
[4] Add a Document(s) to a Collection - Version: v1.1.7 - Enterprise h2oGPTe documentation
[5] Add a Document(s) to a Collection - Version: v1.4.9 - Enterprise h2oGPTe documentation

We can see that we have access to multiple version of the documentation. As a user, we might then ask the same question but clarify which versions we are using.

Chat redesignโ€‹

There are many changes to the feel and functionality of the Chat sessions in the 1.4 release:

  • Settings can now be found in the chat tool bar
    • Customize the LLM tempurature to make more creative or deterministic answers
    • Set the maximum lenght of responses
    • Set the number of neighbor chunks for RAG+ to add additional context from the source documents
  • New controls for each part of the conversation can be found to the right of the user's message
    • Copy the response
    • Provide feedback if the response was good or bad
    • View the entire prompt and context sent to the LLM
    • View usage and cost information about the LLM interaction
    • Delete this Q&A
  • Ask questions with audio using the Listen function of the chat toolbar
  • Easily start chatting with LLMs from the UI without using a collection of data using the New Chat button from the Chat Sessions page

H2O AI Cloud integrationโ€‹

Users of the H2O AI Cloud can now authenticate to their h2oGPTe environment using the Platform Token improving the end-to-end Predictive and Generative workflow.

This is especially helpful when building custom UIs on top of h2oGPTe using Wave. The below code can be used to authenticate to h2oGPTe in your Wave app deployed in the App Store making it so that all users who use your app are logging in to h2oGPTe as themselves.

from h2ogpte import H2OGPTE
import h2o_authn

token_provider = h2o_authn.TokenProvider(
refresh_token=q.auth.refresh_token,
token_endpoint_url=f"{os.getenv('H2O_WAVE_OIDC_PROVIDER_URL')}/protocol/openid-connect/token",
client_id=os.getenv("H2O_WAVE_OIDC_CLIENT_ID"),
client_secret=os.getenv("H2O_WAVE_OIDC_CLIENT_SECRET"),
)
client = H2OGPTE(address=os.getenv("H2OGPTE_URL"), token_provider=token_provider)

Enhanced Jobs experienceโ€‹

When doing document analytics and chat, many of the steps can take some time, such as ingesting a large website or deleting old files. Long running tasks, or Jobs, can be found by clicking the server icon in the top right hand corner. This will open a queue of any running tasks including the ability to easily read error messages if anything went wrong.

General Improvementsโ€‹

  • Search and filter documents by name
  • View the retrieval and LLM response name for each query in the Chat Session Usage
  • Improved quality of generated example questions
  • Less steps needed to customize LLM parameters from the Python API
  • Chat sharing is now available for air-gapped installs

Feedback