Skip to main content

Return to docs

What's new in h2oGPTe v1.5

ยท 5 min read

We are excited to announce the release of h2oGPTe 1.5! Read on to learn about the new features in the 1.5.x series which will improve your ability to find answers and generate new content based on your private data.

Chat Firstโ€‹

The GUI has been revamped to lead with chat first. You can start chatting immediately, and add documents or collections to the chat.

New Connectorsโ€‹

New connectors in v1.5:

  • Amazon S3
  • Google Cloud Storage
  • Azure BLOB
  • Sharepoint
  • Upload Plain Text (automatically triggered if large text is copy & pasted into the chat)

Choice of OCR modelsโ€‹

The v1.5 release brings support for more languages by introducing a new set of OCR model choices (for conversion of documents to text), including the auto-detection of the language for each page of every document.

  • Automatic (default)
  • Tesseract (over 60 different languages)
  • DocTR
  • PaddleOCR
  • PaddleOCR-Chinese
  • PaddleOCR-Arabic
  • PaddleOCR-Japanese
  • PaddleOCR-Korean
  • PaddleOCR-Cyrillic
  • PaddleOCR-Devanagari
  • PaddleOCR-Telugu
  • PaddleOCR-Kannada
  • PaddleOCR-Tamil

Model Comparison Pageโ€‹

A new models page offers easy comparison between all LLMs

  • Tabular view of all metrics such as cost, accuracy, speed, latency, context lengths, vision capabilities, guided generation features and chat template
  • Graphical scatter plot to compare models across 2 dimensions, with optional log-scale
  • Usage and performance stats are now shown as a tab on the models page
  • A self-test button shows green or red lights for each LLM within secons to confirm that all LLMs are operational with "quick" and "rag" benchmark modes exposed to all users that test chat and RAG modes.
  • Admins have access to "full" and "stress" tests as well, to make sure LLMs are configured to handle large contexts properly.

Model Routing with Cost Controlsโ€‹

Automatically chooses the best LLM for the task given cost constraints such as:

  • Max cost per LLM call
  • Willingness to pay for extra accuracy (how much to pay for +10% accuracy for this LLM call?)
  • Willingness to wait for extra accuracy (how long to wait for +10% accuracy for this LLM call?)
  • Max cost per million tokens for LLMs to be considered
  • Fixed list of models to choose from

Any of these cost controls can be combined. The GUI exposes the first 3 cost constraints.

Guardrailsโ€‹

Fully customizable Guardrails:

  • Prompt Guard (fine-tuned DeBERTa v3 model), Jailbreak and Prompt Injection
  • Llama Guard (fine-tuned LLM), 14 classes of unsafe content
  • Custom Guardrails, arbitrary LLM and prompting

Guardrails are applied to:

  • All user prompts

If unsafe content is detected, the following action is performed:

  • fail

Redaction of PII or regular expressionsโ€‹

These PII detection methods are combined for maximum precision and recall:

  • regular expressions
  • Presidio model: 5 languages (en, es, fr, de, zh), 36 different PII entities
  • Custom PII model: 59 different PII entities

Personally identifiable information (PII) is checked for in these places:

  • Parsing of documents
  • LLM input
  • LLM output

If PII is detected in any of the above places, one of the following actions is performed:

  • allow
  • redact
  • fail

You have full control over the list of entities to flag, via JSON spec, controllable per collection.

Document Metadataโ€‹

You can now choose what information from the document is provided to the LLM.

  • Filename
  • Page Number
  • Document Text
  • Document Age
  • Last Modification Date
  • Retrieval Score
  • Ingestion Method
  • URI

Multimodal Vision Capabilitiesโ€‹

v1.5.x brings support for multimodal vision capabilities, including state-of-the-art open-source vision models. This allows processing of flowcharts, images, diagrams and more.

  • GPT-4o/GPT-4o-mini
  • Gemini-1.5-Pro/Flash
  • Claude-3/Claude-3.5
  • InternVL-Chat
  • InternVL2-26B/76B

Support for upcoming LLMs via Chat Templatesโ€‹

v1.5.x can support yet unreleased future LLMs using Hugging Face chat templates.

Guided Generationโ€‹

A powerful new feature in v1.5 is the guided generation. For example, the LLM can be instructed to create perfect JSON that adheres to a provided schema. Or it can be instructed to create output that matches a regular expression, or follows a certain grammar, or contains only output from a provided list of choices.

All these powerful options are exposed in the API:

  • guided_json
  • guided_regex
  • guided_choice
  • guided_grammar
  • guided_whitespace_pattern

Note that guided generation also works for vision models. For most (proprietary) models not hosted by vLLM (such as OpenAI, Claude, etc.), only guided_json is supported for now.

Document AI: Summarize, Extract, Processโ€‹

The document summarization API was generalized to full document processing using the map/reduce paradigm for LLMs. In combination with the new connectors, custom OCR models, document metadata, PII redaction, guided generation, multimodal vision models, prompt templates, powerful Document AI workflows are now possible.

Example use cases:

  • Custom summaries
  • Convert flow charts to custom JSON
  • Extract all financial information
  • Classify documents or images with custom labels

Tagging Documents and Chatting with a subset of the Collectionโ€‹

You can now tag documents (via the Python client), and provide a list of tags to include when chatting with a collection.

Out of the box prompt templatesโ€‹

Multiple new prompt templates were added for convenience.

Improved Scalability and Speedโ€‹

Several improvements to improve the responsiveness of the application have been made.

Eval Studio integrationโ€‹

H2O Eval Studio is now integrated into h2oGPTe.

Sharing of Prompt Templatesโ€‹

Prompt templates can now be shared with selected users.

Improved Cloud integrationโ€‹

Minio backend for storage can be replaced with S3. GCS/Azure storage backend is upcoming.

Security Vulnerability Fixesโ€‹

No critical or high CVEs at the time of release.

Live logs for adminsโ€‹

Real-time logs for core/crawl/chat services for administrator users.

What's new in h2oGPTe v1.4.13

ยท 8 min read

We are excited to announce the release of h2oGPTe 1.4.13! Read on to learn about the new features in the 1.4.x series which will improve your ability to find answers and generate new content based on your private data.

Create non-English embeddingsโ€‹

Your data isn't always in English. In fact, your documents, audio files, and images may span many languages, and now, h2oGPTe can help you answer questions on any language.

v1.4.x brings support for a new embedding model, bge-m3. This embedding model is best in class for multi-lingual data and supports more than 100 languages.

We recommend using bge-large-en-v1.5 for English use cases and this is the default embedding model used in the environment.

Customize the Embedding Model per Collectionโ€‹

You may want to customize the embedding model used for each collection of documents or use case, and now you can when creating a new collection.

All documents added to this collection will be embedded using that model, and all queries to this collection will use that embedding model. Please note that you cannot change the emedding model of a collection after the fact, it is only editable while creating the collection.

Embedding Model optionsโ€‹

The generative AI space is moving fast and there are new technologies every week. H2O.ai is regularly adding support for new embedding and language models. Today, you can enable the following embedding models in your environment:

  • bge-large-en-v1.5
  • bge-m3
  • instructor-large
  • bge-large-zh-v1.5
  • multilingual-e5-large
  • instructor-xl

Support for new LLMsโ€‹

The v1.4 release series brings support for many new LLMs including H2O.ai's small language models H2O Danube. Working with Southeast Asia? You may want to use SeaLLM-7B-v2 or sea-lion-7b-instruct.

The full list has 18+ types of LLMs supported with the latest and greatest regularly being added.

Introducing the Prompt Catalogโ€‹

Gone are the days of having a collection of really good System Prompts saved in a file on your desktop! The new Prompt Catalog comes with 18+ out of the box prompts for common tasks like Question Answering (in many languages) and Expert Summarizing and tones like Funny or Simple.

You can explore all of the available prompts and create your own on the new Prompts page.

When chatting, click the Settings button down near where you chat to chate the prompt template you're using.

Out of the Box Language Promptsโ€‹

The following languages are supported for RAG out of the box by selecting the right prompt template, remember to also use the multi-language embedding model:

  • Chinese
  • Turkish
  • Russian
  • Portuguese
  • Malay
  • Japanese
  • Indonesian
  • Hindi
  • French
  • Persian
  • Spanish Castilian

Out of the Box English Use Casesโ€‹

While users can create their own prompt templates, there are also 9 common use cases supported out of the box.

  • Retrieval Augmented Generation (this is the default)
  • Expert Summarizer: Generates structured summaries
  • Plain and Simple: Creates simple answers that are easily understood
  • Funny and Sarcastic: Creates a humorous personality that injects sarcasm into responses
  • Citations: Provide citations and references to end users
  • Keynote Presentation: Generates TED-quality keynote presentations
  • Tech Impact Analyzer: Generates impact analysis of technology projects
  • Value Determinator: Determines the value provided to the reader
  • Engineering Wizard: Designs and evaluates different proposals for solving engineering problems

When creating your own prompt template, you can choose to use any of these as the starting point if you would like to slightly alter them.

Example: Citation Prompt Templateโ€‹

As an example, we have uploaded the h2oGPTe documentation into the h2oGPTe product and asked "How can I add a new document into a collection?". With the default prompt template, which is optimized for Retrieval Augmented Generation, we get the following helpful answer:

To add a new document into a collection, follow these steps according to the information provided in the document context:

1. Navigate to the Enterprise h2oGPTe navigation menu.
2. Click on 'Collections'.
3. Select the name of the Collection you want to add a document to.
4. Click on '+ Add documents'.
5. Choose a method to import the document, such as Upload documents or Import from system.
6. Browse for the document you want to add.
7. Click 'Add'.

Please note that the exact wording of the navigation menu items and buttons might vary slightly depending on the specific version of Enterprise h2oGPTe you are using.

This helps our end users, but maybe would would like to also provide citation to users so they can go read the full documentation themselves. We will click Settings and select Citation from the dropdown list of Prompt Templates. Asking the same question, we get the same answer, except now we have a list of the exact documentation pages that the answer came from.

To add a new document into a collection, follow these steps according to the document sources provided:

1. Navigate to the Enterprise h2oGPTe interface.
2. Click on 'Collections' in the navigation menu.
3. Select the specific collection where you want to add a document.
4. Click on '+ Add documents'.
5. Choose a method to import the document, such as 'Upload documents' or 'Import from system'.
6. Browse and select the document you wish to add.
7. Click 'Add'.

Remember, you can only select an embedding model once when creating a new collection. Also, you can toggle the 'Create short document summaries' button to auto-generate a summary of your document, and the 'Create sample questions for documents' button to receive auto-suggested sample questions based on your document.

[1] Guide - Collections - Add a Document(s) to a Collection - Version: v1.4.11 - Enterprise h2oGPTe documentation
[2] Create a Collection - Version: v1.3.11 - Enterprise h2oGPTe documentation
[3] Add a Document(s) to a Collection - Version: v1.3.11 - Enterprise h2oGPTe documentation
[4] Add a Document(s) to a Collection - Version: v1.1.7 - Enterprise h2oGPTe documentation
[5] Add a Document(s) to a Collection - Version: v1.4.9 - Enterprise h2oGPTe documentation

We can see that we have access to multiple version of the documentation. As a user, we might then ask the same question but clarify which versions we are using.

Chat redesignโ€‹

There are many changes to the feel and functionality of the Chat sessions in the 1.4 release:

  • Settings can now be found in the chat tool bar
    • Customize the LLM tempurature to make more creative or deterministic answers
    • Set the maximum lenght of responses
    • Set the number of neighbor chunks for RAG+ to add additional context from the source documents
  • New controls for each part of the conversation can be found to the right of the user's message
    • Copy the response
    • Provide feedback if the response was good or bad
    • View the entire prompt and context sent to the LLM
    • View usage and cost information about the LLM interaction
    • Delete this Q&A
  • Ask questions with audio using the Listen function of the chat toolbar
  • Easily start chatting with LLMs from the UI without using a collection of data using the New Chat button from the Chat Sessions page

H2O AI Cloud integrationโ€‹

Users of the H2O AI Cloud can now authenticate to their h2oGPTe environment using the Platform Token improving the end-to-end Predictive and Generative workflow.

This is especially helpful when building custom UIs on top of h2oGPTe using Wave. The below code can be used to authenticate to h2oGPTe in your Wave app deployed in the App Store making it so that all users who use your app are logging in to h2oGPTe as themselves.

from h2ogpte import H2OGPTE
import h2o_authn

token_provider = h2o_authn.TokenProvider(
refresh_token=q.auth.refresh_token,
token_endpoint_url=f"{os.getenv('H2O_WAVE_OIDC_PROVIDER_URL')}/protocol/openid-connect/token",
client_id=os.getenv("H2O_WAVE_OIDC_CLIENT_ID"),
client_secret=os.getenv("H2O_WAVE_OIDC_CLIENT_SECRET"),
)
client = H2OGPTE(address=os.getenv("H2OGPTE_URL"), token_provider=token_provider)

Enhanced Jobs experienceโ€‹

When doing document analytics and chat, many of the steps can take some time, such as ingesting a large website or deleting old files. Long running tasks, or Jobs, can be found by clicking the server icon in the top right hand corner. This will open a queue of any running tasks including the ability to easily read error messages if anything went wrong.

General Improvementsโ€‹

  • Search and filter documents by name
  • View the retrieval and LLM response name for each query in the Chat Session Usage
  • Improved quality of generated example questions
  • Less steps needed to customize LLM parameters from the Python API
  • Chat sharing is now available for air-gapped installs