Skip to main content
Version: v1.6.8 🚧

Run an Evaluator on a Chat

Overview​

Enterprise h2oGPTe offers several Evaluators to assess the quality and safety of large language model (LLM) responses during Chat sessions, leveraging the same robust Evaluators in H2O Eval Studio. To learn about the available Evaluators for a Chat response, see Chat Evaluators.

Instructions​

To run an Evaluator on a Chat response (LLM response), consider the following steps:

  1. In the Enterprise h2oGPTe navigation menu, click Chats.
  2. In the Recent Chats table, click the Preview of the Chat you want to evaluate.
  3. Locate the Chat (LLM) response you want to evaluate, then clickΒ  Evaluate. Evaluate icon
  4. In the Evaluator list, select an Evaluator.
    note

    To learn about each available Evaluator for a Chat response, see Chat Evaluators.

  5. Click Evaluate.
    note

    Enterprise h2oGPTe displays a generated Chat Evaluator in the Eval tab, but it is not saved; if you navigate away, it will be lost.

Chat Evaluators​

This section lists all available Evaluators for a Chat (LLM) response.

Toxicity​

At a high level, this Evaluator helps you determine if the LLM's response contains harmful, offensive, or abusive language that could negatively impact users or violate platform guidelines. To learn more about this Evaluator, see Toxicity Evaluator.

Hallucination​

This Evaluator identifies whether the LLM's response includes fabricated or inaccurate information that doesn't align with the provided context or factual data. To learn more about this Evaluator, see Hallucination Evaluator.

Personally Identifiable Information (PII) leakage​

This Evaluator checks if the LLM's response inadvertently reveals sensitive personal data, such as names, addresses, phone numbers, or other details that could be used to identify an individual. To learn more about this Evaluator, see PII Leakage Evaluator.

Sensitive data leakage​

This Evaluator detects if the LLM discloses confidential or protected information, such as proprietary business data, medical records, or classified content, which could result in security or privacy breaches. To learn more about this Evaluator, see Sensitive Data Leakage Evaluator.

Fairness bias​

This Evaluator assesses whether the LLM's responses exhibit bias or unfair treatment based on gender, race, ethnicity, or other demographic factors, ensuring that the model's output is impartial and equitable. To learn more about this Evaluator, see Fairness Bias Evaluator.


Feedback