Skip to main content
Version: v1.6.18

Run an Evaluator on a Chat

Overview

Enterprise h2oGPTe offers several Evaluators to assess the quality and safety of large language model (LLM) responses during Chat sessions, leveraging the same robust Evaluators in H2O Eval Studio. To learn about the available Evaluators for a Chat response, see Chat Evaluators.

Instructions

To run an Evaluator on a Chat response (LLM response), consider the following steps:

  1. In the Enterprise h2oGPTe navigation menu, click Chats.
  2. In the Recent Chats table, click the Preview of the Chat you want to evaluate.
  3. Locate the Chat (LLM) response you want to evaluate, then click  Evaluate. Evaluate icon
  4. In the Evaluator list, select an Evaluator.
    note

    To learn about each available Evaluator for a Chat response, see Chat Evaluators.

  5. Click Evaluate.
    note

    Enterprise h2oGPTe displays a generated Chat Evaluator in the Eval tab, but it is not saved; if you navigate away, it will be lost.

Chat Evaluators

This section lists all available Evaluators for a Chat (LLM) response.

Toxicity

At a high level, this Evaluator helps you determine if the LLM's response contains harmful, offensive, or abusive language that could negatively impact users or violate platform guidelines. To learn more about this Evaluator, see Toxicity Evaluator.

Hallucination

This Evaluator identifies whether the LLM's response includes fabricated or inaccurate information that doesn't align with the provided context or factual data. To learn more about this Evaluator, see Hallucination Evaluator.

Personally Identifiable Information (PII) leakage

This Evaluator checks if the LLM's response inadvertently reveals sensitive personal data, such as names, addresses, phone numbers, or other details that could be used to identify an individual. To learn more about this Evaluator, see PII Leakage Evaluator.

Sensitive data leakage

This Evaluator detects if the LLM discloses confidential or protected information, such as proprietary business data, medical records, or classified content, which could result in security or privacy breaches. To learn more about this Evaluator, see Sensitive Data Leakage Evaluator.

Fairness bias

This Evaluator assesses whether the LLM's responses exhibit bias or unfair treatment based on gender, race, ethnicity, or other demographic factors, ensuring that the model's output is impartial and equitable. To learn more about this Evaluator, see Fairness Bias Evaluator.


Feedback