Skip to main content

Key terms

This page provides an overview of key terms and concepts that apply to Eval Studio.

Large language model (LLM)

Large Language Models (LLMs) are artificial intelligence systems trained on vast amounts of text data to generate human-like responses in natural language processing tasks. In the context of EvalStudio, an LLM can be considered as a corpus-less RAG with perfect search context.

Retrieval-augmented generation (RAG)

A retrieval-augmented generation (RAG) product. In a more general sense, RAG refers to a technique that combines retrieving relevant information from an external corpus with a pre-trained language model to generate more accurate and contextually rich responses.

Evaluation

An Eval Studio evaluation is the process of assessing the performance of an LLM or RAG model provided by a model host using a specific set of tests and evaluators. The evaluation generates a report and artifacts that summarize the results and provide insights into the model's capabilities and limitations.

Eval eye

Eval eye is a visual component in Eval Studio that provides a detailed view of individual test cases within an evaluation. It allows users to inspect the inputs, outputs, and evaluation metrics for each test case, facilitating a deeper understanding of the model's performance on specific prompts and documents.

Workflow

An Eval Studio workflow is a sequence of steps that define how to evaluate an LLM or RAG model using the Model Risk Management (MRM) framework. A workflow typically includes selecting a model host, importing a test suite, configuring evaluators, running tools, and generating reports and artifacts based on the evaluation results.

Workflow artifact

An Eval Studio workflow artifact is a file or data object generated during a workflow run. Artifacts can include evaluation reports, guardrail configurations, RAG corpus patch files, system prompt extensions, and other outputs that provide insights into the model’s performance and behavior or enable actions based on the evaluation results.

Evaluator

Code that evaluates an LLM or RAG.

Model host

A model host is an environment, server, or platform where an LLM or RAG model is deployed and executed. This can include cloud-based services, or on-premises servers.

Test

An Eval Studio test is a collection of documents (that is, a corpus) along with prompts that are relevant to the corpus, ground truth, constraints, and other parameters that are used to evaluate a RAG or LLM model.

Test suite

An Eval Studio test suite is a collection of Tests.

Test lab

An Eval Studio test lab is a set of resolved prompts, ground truth, constraints, and other parameters that are used to evaluate a RAG or LLM model. Test labs are created by Eval Studio from the Test Suite.

Report

An Eval Studio report is a collection of metrics and visualizations that describe the validity of a RAG or LLM model on a given test configuration.

Ground truth

In the context of LLM and RAG evaluation, ground truth refers to the actual or correct answer to a given question or prompt. It is used as a standard of comparison to evaluate the performance of a model by measuring how closely its outputs match the ground truth.

For example, if a model is asked the question "What is the capital of France?", the ground truth would be "Paris". If the model's output is also "Paris", then it has correctly answered the question. However, if the model's output is "London", then it has made an error, and the difference between its output and the ground truth can be used to measure the model's accuracy or performance.

In the case of RAG pipelines, the ground truth may consist of both the retrieved documents and the final generated answer. The model's ability to retrieve relevant documents and generate accurate answers based on those documents can be evaluated by comparing its outputs to the ground truth.

Perturbator

Perturbators are tools used to introduce small, controlled variations into data, which can help evaluate model robustness.

Perturbation

The process of running the perturbator is called perturbation.


Feedback