Skip to main content
Version: v1.6.14-dev3 🚧

Run an Evaluator on a Collection

Overview​

Enterprise h2oGPTe offers several Evaluators to assess a Collection's performance, reliability, security, fairness, and effectiveness. The available Evaluators for a Collection are based on the Evaluators in H2O Eval Studio. To learn about the available Evaluators for a Collection, see Collection Evaluators.

Instructions​

To run an Evaluator on a Collection, consider the following steps:

  1. In the Enterprise h2oGPTe navigation menu, click Collections.
  2. From one of the following tabs, locate and select the Collection you want to evaluate.
    • All collections
    • My collections
    • shared
  3. Click Evaluations.
  4. Click Run your first evaluation/New evaluation.
  5. In the Evaluator list, select an Evaluator.
    note

    To learn about each available Evaluator for a Collection, see Collection Evaluators.

  6. Click Evaluate.

Collection Evaluators​

This section lists all available Evaluators for a Collection.

Toxicity​

At a high level, this Evaluator helps you determine if the Collection's LLM responses contain harmful, offensive, or abusive language that could negatively impact users or violate platform guidelines. To learn more about this Evaluator, see Toxicity Evaluator.

Hallucination​

This Evaluator identifies whether the Collections LLM responses include fabricated or inaccurate information that doesn't align with the provided context or factual data. To learn more about this Evaluator, see Hallucination Evaluator.

Personally Identifiable Information (PII) leakage​

This Evaluator checks if the Collection's LLM responses inadvertently reveals sensitive personal data, such as names, addresses, phone numbers, or other details that could be used to identify an individual. To learn more about this Evaluator, see PII Leakage Evaluator.

Sensitive data leakage​

This Evaluator detects if the Collection's LLM discloses confidential or protected information, such as proprietary business data, medical records, or classified content, which could result in security or privacy breaches. To learn more about this Evaluator, see Sensitive Data Leakage Evaluator.

Fairness bias​

This Evaluator assesses whether the Collection's LLM responses exhibit bias or unfair treatment based on gender, race, ethnicity, or other demographic factors, ensuring that the model's output is impartial and equitable. To learn more about this Evaluator, see Fairness Bias Evaluator.


Feedback