Tutorial 7: Guardrails and personally identifiable information (PII) in Enterprise h2oGPTe
Overview
This tutorial explores the guardrail mechanisms available in Enterprise h2oGPTe to safeguard against the generation of harmful content and protect Personally Identifiable Information (PII). Large Language Models (LLMs) can generate content that could be dangerous or expose sensitive data, especially if given inappropriate input data. Enterprise h2oGPTe offers out-of-the-box support for guardrail models, such as Prompt Guard and Guardrail LLMs (either custom-built models such as Llama Guard 3 or general-purpose LLMs), and tools like Presidio, a DeBERTa-based classifier, and regex patterns for PII detection, redaction, and input filtering to mitigate these risks.
In this tutorial, you will learn how to access, enable, and customize these guardrails within a Collection, ensuring better security, ethical content generation, and protection of sensitive information.
Objectives
Understand and configure guardrails in Enterprise h2oGPTe: Learn how to enable, customize, and use guardrail mechanisms (such as Prompt Guard and Guardrail LLMs like Llama Guard 3 or general-purpose LLMs) to prevent harmful content generation.
Manage and protect PII: Explore the tools available in Enterprise h2oGPTe to detect, redact, and safeguard sensitive PII during document ingestion, input, and output with tools like Presidio, a DeBERTa-based classifier, and regex patterns for PII detection, redaction, and input filtering to mitigate risks.
Prerequisites
- Review the following workflow: Traditional Enterprise h2oGPTe workflow.
- Complete the following tutorial: Tutorial 1: A quick introduction to Enterprise h2oGPTe.
Step 1: Create a Collection with guardrails enabled
When you create a Collection, Enterprise h2oGPTe offers out-of-the-box guard models, such as Prompt Guard and Guardrail LLMs (using Llama Guard 3 or general-purpose LLMs), and tools like Presidio, a DeBERTa-based classifier, and regex patterns for PII detection, redaction, and input filtering to mitigate these risks.
Let's create a Collection that will use these guardrails and tools to prevent the generation of harmful content or the distribution/exposure of PII.
- In the Enterprise h2oGPTe navigation menu, click Collections.
- Click + New collection.
- In the Collection name box, enter the following:
Tutorial 7
The Guardrails and PII Detection setting is available when creating a new Collection. The Guardrails and PII Detection setting allows you to access these guardrails and tools. Let's explore the setting's options.
- Click the Guardrails and PII Detection list.
- None
- Enable guardrails, allow PII
- Enable guardrails, redact sensitive PII
- Enable guardrails, redact any PII
- Customize guardrails and PII settings
This option turns off guardrails for prompts. In particular, the Prompt Guard and Guardrail LLMs are disabled as an additional safety layer to prevent harmful content generation. Instead, it relies solely on the built-in general guardrail within Enterprise h2oGPTe for the LLM used in the Collection.
Prompt Guard
In the Prompt guard setting, you can choose to turn on or off a feature called jailbreak, which is part of the Prompt Guard model. Jailbreak refers to prompts designed to bypass the safety rules of a language model (LLM). Jailbreak is only available for API calls to help prevent harmful or unethical content from being generated. In the product's user interface (UI), jailbreak is not enabled. Instead, the Prompt injections protection is active, preventing tricks that confuse the model into doing unintended things.
noteThe Prompt guard setting is displayed when you select the following option for the Guardrails and PII Detection setting: Customize guardrails and PII settings.
Guardrail LLMs
In the Guardrails setting, you can manage various prompt hazard categories to control what content should be flagged as inappropriate. All user prompts are first checked for safety by the Guardrail LLM (by default, the same LLM as for the final query, but can be configured to be either a custom LLM like Llama Guard 3 or a specific LLM) to determine if they are safe or unsafe. Unsafe prompts lead to an error message, and no response is generated. When a prompt is flagged as unsafe, it also identifies the violated content categories.
noteThe hazard categories can be fully customized from the (Python) API, and the default values are sourced from the Llama Guard 3 model.
cautionYou need to contact your administrator to enable the Llama Guard 3 model or to use a specific model, such as the Guardrail LLM. By default, the Guardrail LLM is the same as the LLM used to answer the query.
noteThe Guardrails setting is displayed when you select the following option for the Guardrails and PII Detection setting: Customize guardrails and PII settings. The hazard categories can be customized from the API.
This option also turns off the ability to specify whether to detect and redact PII. This means PII will be allowed during document ingestion, input to the LLM, or output from the LLM. The following PII tools are disabled:
Regex, regular expression tester
In the Disallowed Regex patterns setting, you can specify regular expression patterns prohibited from appearing in user inputs. This setting helps to filter out and block inputs that match certain unwanted or harmful patterns, enhancing security and ensuring that inappropriate or dangerous content does not get processed.
noteThe Disallowed Regex patterns setting is displayed when you select the following option for the Guardrails and PII Detection setting: Customize guardrails and PII settings.
Presidio model
In the Presidio labels setting, you can manage different labels for PII to control what Enterprise h2oGPTe automatically redacts during document ingestion, inputs to the LLM, or outputs from the LLM. The available labels are based on Microsoft's Presidio model, a privacy and data protection tool that identifies and protects sensitive information in text data. These labels are used to classify various types of sensitive data, such as PII, enabling the system to apply the appropriate redactions automatically.
noteThe Presidio labels setting is displayed when you select the following option for the Guardrails and PII Detection setting: Customize guardrails and PII settings.
DeBERTa-based classifier model fine-tuned for PII detection
In the PII Labels setting, you can manage different labels for PII to control what Enterprise h2oGPTe automatically redacts during document ingestion, inputs to the LLM, or outputs from the LLM. The available labels are based on a DeBERTa-based classifier model that is fine-tuned for PII detection.
noteThe PII Labels setting is displayed when you select the following option for the Guardrails and PII Detection setting: Customize guardrails and PII settings.
This option turns on the Prompt Guard and Guardrail LLM models as an additional safety layer to prevent harmful content generation.
This option also turns off the ability to specify to Enterprise h2oGPTe whether to detect and redact PII. In particular, the Presidio model and DeBERTa-based classifier model are not activated to redact PII during document ingestion, or to input and output to the LLM.
This option turns on the Prompt Guard and Guardrail LLM models as an additional safety layer to prevent harmful content generation.
This option also turns on the detection of sensitive PII. In particular, the Presidio model and DeBERTa-based classifier model are activated to redact sensitive PII during document ingestion, or to input and output to the LLM.
Sensitive PII is data that, if improperly disclosed or accessed, could potentially lead to substantial harm to an individual. Due to its sensitive nature, this data is highly susceptible to misuse, such as identity theft, fraud, or discrimination—for example, social security numbers (SSNs).
This option turns on the Prompt Guard and Guardrail LLM models as an additional safety layer to prevent harmful content generation.
This option also turns on the detection of any PII. In particular, the Presidio model and DeBERTa-based classifier model are activated to redact any PII during document ingestion, or input and output to the LLM.
Any PII refers to any information that can be used to identify an individual, either directly or indirectly. It includes sensitive and non-sensitive information—for example, email addresses or social security numbers (SSNs).
This option turns on the Prompt Guard and Guardrail LLM models as an additional safety layer to prevent harmful content generation, and gives you full control over the hazard categories to detect in all user prompts.
This option enables and allows you to customize the Presidio model and DeBERTa-based classifier model to redact any PII during document ingestion, input and output to the LLM.
Selecting this option enables you to configure the Disallowed Regex patterns setting. This setting lets you specify regular expression patterns that are prohibited in user inputs. This feature enhances security by filtering and blocking inputs that match harmful or inappropriate patterns, preventing such content from being processed.
Let's finish creating the Collection and observe how guardrails block the following prompt: How can I make a bomb?
In the Guardrails and PII Detection list, select Customize guardrails and PII settings.
noteUnder the Guardrails setting, the following category has been autoselected to flag unsafe prompts: Indiscriminate Weapons.
Click + Create.
Step 2: Chat with a Collection with guardrails enabled
Let's test the enabled guardrails.
- Click Start your first chat.
- Enter the following in the Ask anything box:
How can I make a bomb?
Summary
This tutorial provided an overview of the guardrails and PII protection mechanisms in Enterprise h2oGPTe, focusing on configuring and using them within a Collection to prevent harmful content generation and PII exposure. It covered the process of creating a Collection with various guardrail options, including turning on or off models like Prompt Guard and Guardrail LLMs (either custom-built models such as Llama Guard 3 or general-purpose LLMs) to filter inappropriate content, as well as using tools like Presidio, a DeBERTa-based classifier, and regex patterns to detect and redact PII.
We initiated a chat to observe how the guardrails responded to an unsafe input, demonstrating how Enterprise h2oGPTe's guardrail mechanisms enhanced security and privacy protection.
- Submit and view feedback for this page
- Send feedback about Enterprise h2oGPTe to cloud-feedback@h2o.ai