Key terms
This page provides an overview of key terms and concepts that apply to Eval Studio.
Large language model (LLM)
Large Language Models (LLMs) are artificial intelligence systems trained on vast amounts of text data to generate human-like responses in natural language processing tasks. In the context of EvalStudio, an LLM can be considered as a corpus-less RAG with perfect search context.
Retrieval-augmented generation (RAG)
A retrieval-augmented generation (RAG) product. In a more general sense, RAG refers to a technique that combines retrieving relevant information from an external corpus with a pre-trained language model to generate more accurate and contextually rich responses.
Evaluation
An Eval Studio evaluation is the process of assessing the performance of an LLM or RAG model provided by a model host using a specific set of tests and evaluators. The evaluation generates a report and artifacts that summarize the results and provide insights into the model's capabilities and limitations.
Eval eye
Eval eye is a visual component in Eval Studio that provides a detailed view of individual test cases within an evaluation. It allows users to inspect the inputs, outputs, and evaluation metrics for each test case, facilitating a deeper understanding of the model's performance on specific prompts and documents.
Workflow
An Eval Studio workflow is a sequence of steps that define how to evaluate an LLM or RAG model using the Model Risk Management (MRM) framework. A workflow typically includes selecting a model host, importing a test suite, configuring evaluators, running tools, and generating reports and artifacts based on the evaluation results.
Workflow artifact
An Eval Studio workflow artifact is a file or data object generated during a workflow run. Artifacts can include evaluation reports, guardrail configurations, RAG corpus patch files, system prompt extensions, and other outputs that provide insights into the model’s performance and behavior or enable actions based on the evaluation results.
Evaluator
Code that evaluates an LLM or RAG.
Model host
A model host is an environment, server, or platform where an LLM or RAG model is deployed and executed. This can include cloud-based services, or on-premises servers.
Test
An Eval Studio test is a collection of documents (that is, a corpus) along with prompts that are relevant to the corpus, ground truth, constraints, and other parameters that are used to evaluate a RAG or LLM model.
Test suite
An Eval Studio test suite is a collection of Tests.
Test lab
An Eval Studio test lab is a set of resolved prompts, ground truth, constraints, and other parameters that are used to evaluate a RAG or LLM model. Test labs are created by Eval Studio from the Test Suite.
Report
An Eval Studio report is a collection of metrics and visualizations that describe the validity of a RAG or LLM model on a given test configuration.
Ground truth
In the context of LLM and RAG evaluation, ground truth refers to the actual or correct answer to a given question or prompt. It is used as a standard of comparison to evaluate the performance of a model by measuring how closely its outputs match the ground truth.
For example, if a model is asked the question "What is the capital of France?", the ground truth would be "Paris". If the model's output is also "Paris", then it has correctly answered the question. However, if the model's output is "London", then it has made an error, and the difference between its output and the ground truth can be used to measure the model's accuracy or performance.
In the case of RAG pipelines, the ground truth may consist of both the retrieved documents and the final generated answer. The model's ability to retrieve relevant documents and generate accurate answers based on those documents can be evaluated by comparing its outputs to the ground truth.
Perturbator
Perturbators are tools used to introduce small, controlled variations into data, which can help evaluate model robustness.
Perturbation
The process of running the perturbator is called perturbation.
- Submit and view feedback for this page
- Send feedback about H2O Eval Studio to cloud-feedback@h2o.ai