Version: v1.6.37-dev1 🚧

Models

Overview

The Models page allows you to explore the supported large language models (LLMs) and perform self-tests on the LLMs used throughout Enterprise h2oGPTe.

Supported LLMs

Enterprise h2oGPTe supports the following LLMs:

OpenAI

Google

Anthropic

Mistral AI

Qwen (Alibaba Cloud)

DeepSeek AI

Other Models

and many more. Our latest RAG benchmark results lists all tested models: RAG benchmark results.

note

The table inside the LLMs tab renders the supported LLMs.

Run self-tests

Overview

A self-test enables you to evaluate the LLMs used across Enterprise h2oGPTe to generate a response to a query and process and summarize a Document.

Note

To ensure system stability and prevent resource conflicts, only one self-test can be run for a specific LLM at any given time across the entire system. If you attempt to start a test for an LLM that is already being tested (even by another user), the new test will not be submitted until the current one is complete.

Instructions

To run a self-test, consider the following steps:

In the Enterprise h2oGPTe navigation menu, click Models.
Click Run self-tests.
In the Run self-tests list, select a self-test.
options
- Quick test (chat-like short query): This self-test evaluates the model's ability to respond to brief, conversational queries accurately and contextually.
- RAG test (large context query): This self-test evaluates the model's ability to understand and respond accurately to queries that require it to process and maintain a long context or multiple pieces of information over an extended conversation.
- Full test (full context query): This self-test evaluates the model's ability to understand, retain, and use the information provided across a conversation to respond to complex, multi-part queries. It requires the model to keep track of various details and offer responses that synthesize all the given information. This option is only available to an admin.
- Stress test (repeated full test): This self-test evaluates the model's consistency, accuracy, and contextual awareness over multiple interactions on the same topic. It ensures the model can handle information continuity over several sessions, maintaining the context and providing accurate responses. This option is only available to an admin.
(Conditional step) If you selected Full test or Stress test, consider the following steps:
- Full test
  1. In the Run full tests box, click Run tests.
- Stress test
  1. In the Run stress tests box, click Run tests.

Tabs

LLMs

Overview

The LLMs tab contains a table rendering the large language models (LLMs) supported for generating responses to user queries and summarizing and processing documents.

LLMs

Instructions

To access the LLMs tab, consider the following steps:

In the Enterprise h2oGPTe navigation menu, click Models.

Charts

Overview

The chart in the Charts tab allows you to compare various models across several selected metrics by adjusting its x and y axes to these metrics. These models generate responses to user queries and summarize and process Documents.

Example

For example, in the image below, the chart compares the performance and usage characteristics of various large language models (LLMs). The x-axis shows the estimated typical cost per query (in USD), while the y-axis displays the RAG benchmark accuracy percentage.

The chart includes several data points representing different LLMs, such as GPT-4, Meta-Llama, and Mistral. Each data point is labeled with the model name and some additional information, such as the version number or specific capabilities.

In this example, the top right of the chart indicates models with higher accuracy and higher cost per query, while the bottom left shows models with lower accuracy and lower cost per query.

The chart can be useful for researchers, developers, or users to compare the performance and cost-effectiveness of different large language models. This can help them select the most appropriate model for their needs or applications.

Charts

Log-scaling the x/y axis

Log-scaling the x/y axis of the graph means transforming the x/y axis values using a logarithmic function. This transformation is particularly useful when dealing with data that spans several orders of magnitude, as it can help visualize the data more clearly by compressing the scale of large values and expanding the scale of small values.

Instructions

To access the Charts tab, consider the following steps:

In the Enterprise h2oGPTe navigation menu, click Models.
Click the Charts tab.

Usage and performance

Overview

The Usage and performance tab provides an overview of your approximated large language model (LLM) usage in the last 24 hours. The section includes details on quotas, cost estimates, and usage breakdown by specific LLMs.

note

The daily quotas and cost estimates are configured by administrators.

Instructions

To access the Usage and performance tab, consider the following steps:

On Enterprise h2oGPTe navigation menu, click Models.
Click the Usage and performance tab.
1. Daily quota: The daily allocation for LLM usage.
2. Percentage used: The percentage of the daily quota that has been utilized.
3. Usage: The cost incurred for LLM usage within the last 24 hours.
4. [Cost], [Speed], or [Latency] by LLM: This section covers the cost, speed, or latency of each LLM incurred in a given timeframe.

Usage and performance

note

The [Cost], [Speed], or [Latency] by LLM section defaults to displaying usage within the last 24 hours. However, you can choose different time frames to see usage information within the Last hour, Last 24 hours, Last week, Last month, or All time.

Feedback

Submit and view feedback for this page
Send feedback about Enterprise h2oGPTe to cloud-feedback@h2o.ai

Overview​

Supported LLMs​

Meta​

OpenAI​

Google​

Anthropic​

Mistral AI​

Qwen (Alibaba Cloud)​

DeepSeek AI​

Other Models​

Run self-tests​

Overview​

Instructions​

Tabs​

LLMs​

Overview​

Instructions​

Charts​

Overview​

Instructions​

Usage and performance​

Overview​

Instructions​

Overview

Supported LLMs

Meta

OpenAI

Google

Anthropic

Mistral AI

Qwen (Alibaba Cloud)

DeepSeek AI

Other Models

Run self-tests

Overview

Instructions

Tabs

LLMs

Overview

Instructions

Charts

Overview

Instructions

Usage and performance

Overview

Instructions