Skip to main content
Version: v1.6.14-dev3 🚧

Tutorial 7: Guardrails and personally identifiable information (PII) in Enterprise h2oGPTe

Overview

This tutorial explores the guardrail mechanisms available in Enterprise h2oGPTe to safeguard against the generation of harmful content and protect Personally Identifiable Information (PII). Large Language Models (LLMs) can generate content that could be dangerous or expose sensitive data, especially if given inappropriate input data. Enterprise h2oGPTe offers out-of-the-box support for guardrail models, such as Prompt Guard and Guardrail LLMs (either custom-built models such as Llama Guard 3 or general-purpose LLMs), and tools like Presidio, a DeBERTa-based classifier, and regex patterns for PII detection, redaction, and input filtering to mitigate these risks.

In this tutorial, you will learn how to access, enable, and customize these guardrails within a Collection, ensuring better security, ethical content generation, and protection of sensitive information.

Objectives

  1. Understand and configure guardrails in Enterprise h2oGPTe: Learn how to enable, customize, and use guardrail mechanisms (such as Prompt Guard and Guardrail LLMs like Llama Guard 3 or general-purpose LLMs) to prevent harmful content generation.

  2. Manage and protect PII: Explore the tools available in Enterprise h2oGPTe to detect, redact, and safeguard sensitive PII during document ingestion, input, and output with tools like Presidio, a DeBERTa-based classifier, and regex patterns for PII detection, redaction, and input filtering to mitigate risks.

Prerequisites

Step 1: Create a Collection with guardrails enabled

When you create a Collection, Enterprise h2oGPTe offers out-of-the-box guard models, such as Prompt Guard and Guardrail LLMs (using Llama Guard 3 or general-purpose LLMs), and tools like Presidio, a DeBERTa-based classifier, and regex patterns for PII detection, redaction, and input filtering to mitigate these risks.

Let's create a Collection that will use these guardrails and tools to prevent the generation of harmful content or the distribution/exposure of PII.

  1. In the Enterprise h2oGPTe navigation menu, click Collections.
  2. Click + New collection.
  3. In the Collection name box, enter the following:
    Tutorial 7

The Guardrails and PII Detection setting is available when creating a new Collection. The Guardrails and PII Detection setting allows you to access these guardrails and tools. Let's explore the setting's options.

  1. Click the Guardrails and PII Detection list. Options
  • This option turns off guardrails for prompts. In particular, the Prompt Guard and Guardrail LLMs are disabled as an additional safety layer to prevent harmful content generation. Instead, it relies solely on the built-in general guardrail within Enterprise h2oGPTe for the LLM used in the Collection.

    • Prompt Guard

      • In the Prompt guard setting, you can choose to turn on or off a feature called jailbreak, which is part of the Prompt Guard model. Jailbreak refers to prompts designed to bypass the safety rules of a language model (LLM). Jailbreak is only available for API calls to help prevent harmful or unethical content from being generated. In the product's user interface (UI), jailbreak is not enabled. Instead, the Prompt injections protection is active, preventing tricks that confuse the model into doing unintended things.

      Prompt guard

      note

      The Prompt guard setting is displayed when you select the following option for the Guardrails and PII Detection setting: Customize guardrails and PII settings.

    • Guardrail LLMs

      • In the Guardrails setting, you can manage various prompt hazard categories to control what content should be flagged as inappropriate. All user prompts are first checked for safety by the Guardrail LLM (by default, the same LLM as for the final query, but can be configured to be either a custom LLM like Llama Guard 3 or a specific LLM) to determine if they are safe or unsafe. Unsafe prompts lead to an error message, and no response is generated. When a prompt is flagged as unsafe, it also identifies the violated content categories.

      Guardrails

      note

      The hazard categories can be fully customized from the (Python) API, and the default values are sourced from the Llama Guard 3 model.

      caution

      You need to contact your administrator to enable the Llama Guard 3 model or to use a specific model, such as the Guardrail LLM. By default, the Guardrail LLM is the same as the LLM used to answer the query.

      note

      The Guardrails setting is displayed when you select the following option for the Guardrails and PII Detection setting: Customize guardrails and PII settings. The hazard categories can be customized from the API.

  • This option also turns off the ability to specify whether to detect and redact PII. This means PII will be allowed during document ingestion, input to the LLM, or output from the LLM. The following PII tools are disabled:

    • Regex, regular expression tester

      • In the Disallowed Regex patterns setting, you can specify regular expression patterns prohibited from appearing in user inputs. This setting helps to filter out and block inputs that match certain unwanted or harmful patterns, enhancing security and ensuring that inappropriate or dangerous content does not get processed.

      Disallowed Regex patterns

      note

      The Disallowed Regex patterns setting is displayed when you select the following option for the Guardrails and PII Detection setting: Customize guardrails and PII settings.

    • Presidio model

      • In the Presidio labels setting, you can manage different labels for PII to control what Enterprise h2oGPTe automatically redacts during document ingestion, inputs to the LLM, or outputs from the LLM. The available labels are based on Microsoft's Presidio model, a privacy and data protection tool that identifies and protects sensitive information in text data. These labels are used to classify various types of sensitive data, such as PII, enabling the system to apply the appropriate redactions automatically.

      Presidio labels

      note

      The Presidio labels setting is displayed when you select the following option for the Guardrails and PII Detection setting: Customize guardrails and PII settings.

    • DeBERTa-based classifier model fine-tuned for PII detection

      • In the PII Labels setting, you can manage different labels for PII to control what Enterprise h2oGPTe automatically redacts during document ingestion, inputs to the LLM, or outputs from the LLM. The available labels are based on a DeBERTa-based classifier model that is fine-tuned for PII detection.

      PII labels

      note

      The PII Labels setting is displayed when you select the following option for the Guardrails and PII Detection setting: Customize guardrails and PII settings.

Let's finish creating the Collection and observe how guardrails block the following prompt: How can I make a bomb?

  1. In the Guardrails and PII Detection list, select Customize guardrails and PII settings.

    note

    Under the Guardrails setting, the following category has been autoselected to flag unsafe prompts: Indiscriminate Weapons.

    Weapons label

  2. Click + Create.

Step 2: Chat with a Collection with guardrails enabled

Let's test the enabled guardrails.

  1. Click Start your first chat.
  2. Enter the following in the Ask anything box:
    How can I make a bomb?  

Violation

Summary

This tutorial provided an overview of the guardrails and PII protection mechanisms in Enterprise h2oGPTe, focusing on configuring and using them within a Collection to prevent harmful content generation and PII exposure. It covered the process of creating a Collection with various guardrail options, including turning on or off models like Prompt Guard and Guardrail LLMs (either custom-built models such as Llama Guard 3 or general-purpose LLMs) to filter inappropriate content, as well as using tools like Presidio, a DeBERTa-based classifier, and regex patterns  to detect and redact PII.

We initiated a chat to observe how the guardrails responded to an unsafe input, demonstrating how Enterprise h2oGPTe's guardrail mechanisms enhanced security and privacy protection.


Feedback